0% found this document useful (0 votes)
1K views52 pages

House Price Prediction

The document discusses machine learning and its application to house price prediction. It provides background on machine learning, describing it as a subset of artificial intelligence that allows computers to learn from data without being explicitly programmed. The document then discusses the need for accurate house price prediction and the challenges of the current manual process. It proposes using machine learning algorithms to develop a system for predicting house prices based on attributes in order to help buyers, sellers and agents.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views52 pages

House Price Prediction

The document discusses machine learning and its application to house price prediction. It provides background on machine learning, describing it as a subset of artificial intelligence that allows computers to learn from data without being explicitly programmed. The document then discusses the need for accurate house price prediction and the challenges of the current manual process. It proposes using machine learning algorithms to develop a system for predicting house prices based on attributes in order to help buyers, sellers and agents.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 52

CHAPTER ONE

INTRODUCTION

1.1 Background of the Study

Machine learning is a subset of Artificial Intelligence (AI) that attempts to train computers

with new knowledge through input of data, such as texts, images, numerical values, and so

on, and support its interaction with other computer networks. According to (Feggella, 2019),

machine learning is about ‘the science of getting computers to learn and act like humans do,

and improve their learning over time in autonomous fashion, by feeding them data and

information in the form of observations and real-world interactions’, it is a relevance of

artificial intelligence (AI) that endow with systems the capability to repeatedly learn and

improve from experience without being overtly programmed. Machine learning centers

attention on the growth of computer programs that can access data and use it to be trained

for themselves. The process of learning starts with interpretation of data, such as examples,

straight experience, or instruction, so as to look for sample in data and make enhanced

judgment in the future pedestal on the instance that it provides. The primary aspire is to

permit the c o m p u t e r s l e a r n r o b o t i c a l l y without h u m a n interference or support and

fiddle with actions accordingly.

Machine learning is one of the cutting- e d g e techniques that can be used to identify,

interpret, and analyze hugely complicated data structures and patterns ( N g i a m & Khor,

2019). It is one of the most effective methods for prediction (Harrington, 2018).

A house is a single unit residential building which may range in complexity from

rudimentary hut to a complex structure of wood, masonry, concrete or other material,

outfitted with plumbing, electrical, and heating, ventilation and air-conditioned systems.

Houses use a range of different roofing systems to keep precipitation such as rain from

1
getting into the dwelling space (Webster, 2021). A house is one of the most important needs

of man. It provides shelter and wellbeing for people; in some cases, some houses depict the

status of a person in the society.

Accurately estimating the price of a house is an important problem for many stakeholders

including house owners, house buyers, agents, creditors, and investors. It is also a difficult

one. Though it is common knowledge that factors such as the size, number of rooms and

location affect the price, there are many other things at play. Additionally, prices are

sensitive to changes in market demand and the peculiarities of each situation, such as

when a property needs to be urgently sold.

One possible explanation for the relative increase in house prices is a simple income

effect, or non- homogeneity in preferences. Changes in preference rates also impact housing

prices (Milley, 2018).

There is some evidence that the increase in housing wealth does not stem from an

increase in the value of houses, but rather from the increase in the value of the land

upon which they are built. First, a price index that include the value of land, the

Conventional Mortgage Home Price Index, has increased approximately 0.75% faster

than indexes that do not, such as the Census's Composite Construction Cost index, on

an annual basis (Aditya, 2017).

1.2 Statement of the Problem

The current house buying is expensive as the buyer has to pay commission to an agent, in

other for him source properties, bid at auction and generally represent a buyer throughout the

buying process.

Manual process of purchasing a house is time consuming; (Seidor, 2018). As a buyer has

to roam places in search for the house he wishes to buy, and the seller or agent also may

2
have to go in search for an interested buyer; the time taken for both to meet their

expectation will be considerably much.

House sellers have to formulate an estimation of the worth based on its characteristics or

features in similarity to the existing market price of related houses; this will be hectic as

the seller will have to go through series of calculations and considerations.

Manual purchasing of a house is associated with the risk of the interested buyer falling into

the hands of scammers; house scammers pose to be house owners or real estate agents,

forge documents and end up duping their victims, the buyer.

1.3 Aim and Objectives of the Study

The aim is to develop a system that predicts the efficient house pricing for house buyers

with respect to their budgets and priorities. Using machine learning algorithms.

The objectives of this project are to:

1. Design a model that predicts the house prices so as to minimize the problems faced

by the customer

2. Train and test the model developed

3. Evaluate the functionality of the system using Mean Absolute Error (MAE), Mean

Squared Error (MSE), Root Mean Squared Error (RMSE), performance metrics.

1.4 Significance of the Study

Predicting house prices is expected to help people who plan to buy a house so they can know

the price range in the future, then they can plan their finance properly. In addition, house

price predictions are also beneficial for property investors to know the trend of housing

prices.

3
The system will help customers to invest in a home without approaching an agent. It will save

time and energy, the system will be easily accessible anytime, anywhere. The system will also

save the real estate agent the stress of travelling from one place to the other in vain.

1.5 Scope of the study

The system will be implemented using machine learning algorithms that will train the

system, a number of house attributes will be required by the system from the user which will

be used by the system to predict the price.

The system will be used only within Makurdi metropolis, Benue State of Nigeria. It will be

used by House sellers, buyers and agents. The locations in the system will be that of the

streets in Makurdi.

The house sellers will use the system to predict the price and communicate back to a

customer who may want to make enquiries directly from them, using the system to predict

the price will save the time of getting back to the customer.

The buyers will use the system to find out the price of the house he may be interested in,

which will enable him work properly on his budget before contacting the seller.

1.6 Definition of Terms

1. House: A house is a single unit residential building which may range in complexity

from rudimentary hut to a complex structure of wood, masonry, concrete or other

material, outfitted with plumbing, electrical, and heating, ventilation and air-

conditioned systems. Houses use a range of different roofing systems to keep

precipitation such as rain from getting into the dwelling space.

2. Price: price is the amount of money that has to be paid to acquire a given product. It

can also be a measure of value

4
3. Prediction: prediction is a forecast. It is a statement about the future, sometimes

based on facts and evidence but not always

4. Machine Learning: machine learning is a branch of Artificial Intelligence based on

the idea that systems can learn from data, identify patterns and make decisions with

minimal human intervention

5. Algorithm: An algorithm is a process or set of rules to be followed in calculations or

other problem solving operations, especially by a computer.

6. Regression: Regression is a supervised machine learning technique which helps in

finding the correlation between variables and enables prediction of continuous output

variable based on the one or more predictor variables.

5
CHAPTER TWO

LITERATURE REVIEW

2.1 Overview of Houses

House is one of the basic needs of human existence. It protects us from the vagaries of nature,

from threats, natural or otherwise. A house provides a sense of security and wellbeing, along

with an economic standing in society. A house is not only a mere physical structure but also a

symbol of power, authority and a host of other things that come along with it. Nowadays a

house is no longer treated as something that is just a shelter but has metamorphosed into a

symbol of economic prosperity, a vulgar display of wealth and a classist expression.

Houses come in various styles, forms and shapes; from mansions to bungalows to terraces.

These different types of houses all have some uniqueness about them. Over the years, some

housing styles have gone in and out of fashion.

As it everywhere, safe affordable housing is a basic necessity for every family, (Tracy, 2019).

2.1.1 Types of Houses

Houses include Terraced houses, Duplexes, Penthouse, Detached buildings, Semi-detached

and Boys quarter buildings.

a. Bungalow: A bungalow is typically a one story home, cottage or cabin. Bungalows are

generally small in size when it comes to square meters. They are inexpensive to build and

easy to maintain compared to other buildings as a result they provide an affordable home

option for both working class and non-working class buyers.

b. Penthouse: A penthouse apartment is a unit on the topmost floor of a multi-storey

apartment building. What differentiates a penthouse from other apartments is its luxury

features or elements.

6
c. Mansion: A mansion is a very big luxury home, often upwards of 5,000 square feet in size.

However, what actually qualifies as a mansion depends on opinion and location. Calling a

home, a mansion indicates a level of grandeur, style, and quality far above the normal in a

given area. This correlates with the housing patterns of the rich and wealthy. The size of a

property, as well as the number of rooms and bathrooms all, play a part in defining what a

mansion is, but there are further defining features, such as entertainment facilities, leisure

space, and luxury finishing. So things like grand staircases, crystal chandeliers, big open

foyers, massive gardens, swimming pools, tennis courts, home automation, and various

other high-end amenities.

d. Apartments or flats: Apartments or flats are among the most popular forms of housing in

the world today. An apartment building is a combination of many separate homes stacked

on top and next to each other. Each apartment acts as its own dwelling or living space. As

each apartment forms just a section of the overall building: it offers less privacy than

alternative types of housing. By owning a flat, a person has access to common areas such

as, green space, playgrounds, rooftops and gyms.

e. Terraced house: A terrace, house, townhouse or row house is a single-family home that is

usually set over two or three floors. A terrace house sits side-by-side with other terraces

(joined together) forming a tight row down a road or block or the inside of a gated estate.

They are designed to accommodate as much people as possible in densely populated cities.

Terraces are generally more affordable than detached and semi-detached homes.

f. Semi-detached house: A semi-detached house (sometimes called a “semi”) is a single-

family home that shares a single wall with the next house. This style differentiates it from

terraced houses (with shared walls on both sides), and detached houses (with no shared

walls). You can find single-story semidetached homes (bungalows) and semis that are

7
spread over two floors. Building costs are typically lower than that of a fully detached

house, thus you tend to get extra space for your money (or the same thing for less).

g. Detached house: Another popular type of house is the detached house. Detached houses

are associated with the rich and famous, and feature some pretty impressive architecture.

Although not as large as mansions, they are often just as luxurious, depending on tastes

and how they are furnished. They are usually spread over two or more floors and feature

their own private gates: leading onto a private compound. Detached houses are great for

people that like their privacy and for those with larger families.

h. Duplex: A duplex can be thought of as a house where two different units (on two different

floors) are stacked on one another like apartments. In essence, a duplex is like having two

different houses in the same home. As a rule of thumb: duplexes have at least two floors

and you can find functioning apartments on each floor. Thus, this type of home is perfect

for bigger families. Similar structures with three or four housing units or floors are called

triplex or fourplex.

i. Traditional houses: Traditional houses in Nigeria are generally found in the more rural

parts of the country. These houses reflect the traditional house building techniques and

styles of the various ethnic groups of Nigeria. The primary building materials used in

constructing traditional houses include wood, straw, stones, and mud. Although they are

far cheaper to build: these homes are often outdated and are far less convenient for modern

life. Consequently, they are continuously being replaced (especially in areas close to

cities) by more modern homes, (Kanilelori, 2021).

When a buyer is looking to acquire an already built house; it is important to know which type

of home will fit his or her requirements and financial stability.

8
2.2 House Price Prediction

A house is not only the basic need of a man but today it also represents the riches and

prestige of a person. Investment in house generally seems to be profitable because their

property values do not decline rapidly. Changes in the house price can affect various

household investors, bankers, policymakers, and others. Investment in the housing sector

seems to be an attractive choice for investments. The relationship between house prices and

the economy is an important motivating factor for predicting house prices. House prices

trends are not only the concerns for buyers and sellers, but they also indicate the current

economic situations. Therefore, it is important to predict the house prices without bias to help

both buyers and sellers make their decisions (Wu, 2017). Thus, predicting a house price is an

important economic index. It involves considering some major features of a house and

coming up with and estimation which may not be exactly the actual price of the house but

close to it.

A house is based on the idea of subjectivity and mutual connection with the person who lives

in it and within a right architectural scope, resulting in a good arrangement of the internal

function, (Zakaria , 2020).

2.2.1 Factors Affecting House Price

There are three main factors which determine house prices they include Features, concept and

location, but house prices can be explained as a general income function (Imran et al., 2021).

1. Features: The features of a house include the physical attributes of a house, the

features include the number of bathrooms, the roof style, roof material and so on.

They have a great influence on house prices.

9
2. Concept: The concept of a house is the house style and can be the purpose of the

house, a house can be used for many purposes like a place to live for shelter, for

political purposes and etc.

3. Location: the location of a house means a place or position where a house is situated.

Locations range from developed, developing and under developed areas. Houses

located in developed areas tend to have greater prices compared to houses in less

developed areas.

2.3 Regression Algorithms

Regression is a supervised machine learning technique which helps in finding the correlation

between variables and enables prediction of continuous output variable based on the one or

more predictor variables.

Regression is concerned with specifying the relationship between a single numeric dependent

variable (the value to be predicted) and one or more numeric independent variables (the

predictors). As the name implies the dependent variable depends on the value of the

independent variable or variables. The simplest forms of regression assume that the

relationship between the independent and dependent variables follows a straight line.

It is mainly used for prediction, forecasting, time series modeling, and determining the causal

effect relationship between variables, (Nagaraju & Giridhar, 2019).

Regression comprises of several algorithms but the purpose of this study only six will be

discussed.

10
1. Linear Regression

Linear Regression is a supervised machine learning model that attempts to model a linear

relationship between dependent variables (Y) and independent variables (X). The

mathematical representation of the Linear Regression is:

Y= a0 + a1 X + Ɛ

In the above equation

Y = Dependent Variable

X = Independent Variable

a0 = Intercept of the line that offers additional Degree of Freedom

a1 = Linear regression coefficient, which is a scale factor to ever input value

Ɛ = Random Error

When the linear regression algorithm is implemented, it starts finding the best fit line using a 0

and a1. In such a way, it becomes more accurate to actual data points; since the value for a 0

and a1 is recognized, the model can be used for predicting the response, (Khushbu & Suniti,

2018).

2. Lasso Regression

The word “LASSO” denotes Least Absolute Shrinkage and Selection Operator. Lasso

regression follows the regularization technique to create prediction. It is given more priority

over the other regression methods because it gives an accurate prediction. Lasso regression

model uses shrinkage technique. In this technique, the data values are shrunk towards a

central point similar to the concept of mean. The lasso regression algorithm suggests a

simple, sparse models (i.e. models with fewer parameters), which is well-suited for models or

data showing high levels of multicollinearity or when we would like to automate certain parts

of model selection, like variable selection or parameter elimination using feature engineering.

11
Lasso Regression algorithm utilizes L1 regularization technique It is taken into consideration

when there are more number of features because it automatically performs feature selection,

(Melkumova & Shatskikh, 2017).

3.Ridge Regression

Ridge Regression is another type of regression algorithm and is usually considered when

there is a high correlation between the independent variables or model parameters. As the

value of correlation increases the least square estimates evaluates unbiased values. But if the

collinearity in the dataset is very high, there can be some bias value. Therefore, a bias matrix

is created in the equation of Ridge Regression algorithm. It is a useful regression method in

which the model is less susceptible to overfitting and hence the model works well even if the

dataset is very small (Melkumova & Shatskikh, 2017).

4. Decision Tree Regression

Decision tree builds regression or classification models in the form of a tree structure. It

breaks down a dataset into smaller and smaller subsets while at the same time an associated

decision tree is incrementally developed. The final result is a tree with decision

nodes and leaf nodes. A decision node has two or more branches, each representing values

for the attribute tested. Leaf node represents a decision on the numerical target. The topmost

decision node in a tree which corresponds to the best predictor called root node. Decision

trees can handle both categorical and numerical data (Burcu & Ipek, 2020).

5. Extra Trees Regression

Extremely Randomized Trees, or Extra Trees for short, is an ensemble machine learning

algorithm. The Extra Trees algorithm works by creating a large number of unpruned decision

trees from the training dataset. Predictions are made by averaging the prediction of the

decision trees in the case of regression or using majority voting in the case of classification.

12
The predictions of the trees are aggregated to yield the final prediction, by majority vote in

classification problems and arithmetic average in regression problems.

The random selection of split points makes the decision trees in the ensemble less correlated,

although this increases the variance of the algorithm. This increase in variance can be

countered by increasing the number of trees used in the ensemble (Ernest et al., 2020).

6. KNeighbors Regression

In KNeighbors regression, the k-NN algorithm is used for estimating continuous variables.

The algorithm uses a weighted average of the k nearest neighbors, weighted by the inverse of

their distance KNN regression is a non-parametric method that, in an intuitive manner,

approximates the association between independent variables and the continuous outcome

by averaging the observations in the same neighborhood. The size of the neighborhoods

needs to be set by the analyst or can be chosen using cross-validation to select the size that

minimizes the mean-squared error. The distance to the kth nearest neighbor can also be seen

as a local density estimate and thus is also a popular outlier score in anomaly detection. The

larger the distance to the k-NN, the lower the local density, the more likely the query point is

an outlier, this outlier model, along with another classic data mining method, local outlier

factor, works well also in comparison to more recent and more complex approaches,

according to a large scale experimental analysis (Ahmed et al., 2020).

2.4 Related Works

Houses have become a necessity in this present age, not only for people looking into buying

the house but also the people that sell these houses. According to(Shinde & Gawande, 2018).

There are different machine learning algorithms to predict the house prices which many

researchers have compared in their research works and come up with important results.

13
(Wu, 2017) used 16 principal components as inputs of support Vector Regression. For

feature selection experiment, fifteen features were selected. The experiment result showed

that there is no difference between the performance of feature selection and feature

extraction. Both achieve 0.86 R-square scores after log transformation on house price. The

best combination of Parameter that achieved the highest R-square was Support Regression

Vector.

(Alfiyatin et al., 2017) shows the prediction model based on regression analysis and particle

swarm optimization (PSO). Hedonic pricing is implemented using regression techniques to

predict the NJOP price (Dependent Variable) in the city of Malang, based on factors such as

land area, NJOP land price, NJOP building price. PSO is a stochastic optimization technique

used for the selection of affect variables. The results obtained show that the Regression

model is more accurate compared to PSO.

(Lu et al., 2018)examined the creative feature engineering and proposed a hybrid Lasso and

Gradient boosting regression model that promises better prediction. They used Lasso in

feature selection. They did many iterations of feature engineering to find the optimal number

of features that will improve the prediction performance. Furthermore, they used Lasso for

feature selection to remove the unused features and found that less features provide the best

score by running a test on Ridge, Lasso and Gradient boosting.

(Babu & Chandran, 2019) expressed that there is a need to use a mix of models; a linear

model gives a high bias (underfit) whereas a high model complexity-based model gives a

high variance (overfit). The outcome of this study can be used in the annual revision of the

guideline value of land which may add more revenue to the State Government while this

transaction is made.

14
(Satish et al., 2019) observed that their data set took more than one day to prepare. As

opposed to performing the computations sequentially, various processors utilized and the

computations involved, which might possibly decrease the preparation time furthermore

prediction period. Including functionalities under the model. (Chouthai et al., 2019) used a

data set of 100 houses with several parameters. They used 50 percent of the data set to train

the machine and 50 percent to test the machine. According to them results were truly

accurate. And they tested it with different parameters also. Not using PSO makes it easier to

train machines with complex problems and hence regression is used.

A study was accomplished by (Ahmad & Nawar, 2020), where they did a comparison of

artificial neural network and multiple linear regression for prediction. In their study, the

impact of different morphological measures on live weight has been modelled by artificial

neural networks and multiple linear regression analyses. They used three different back-

propagation techniques for ANN, namely Levenberg-Marquardt, Bayesian regularization,

and Scaled conjugate. They showed that ANN is more successful than multiple linear

regressions in the prediction they performed.

(Kuvalekar et al., 2020) suggests that every single organization in today’s selling business is

operating fruitfully to achieve a competitive edge over alternative competitors. There is a

need to simplify the process for a normal human being while providing the best results. In the

process of developing their model, various retrospective techniques were studied. SVM,

Random Forest, Linear regression, Multiple linear regression, Decision Tree Regressor,

KNN, all tested on training databases. However, the decision tree regressor provided high

accuracy in predicting house prices. The decision to choose an algorithm depends largely on

the size and type of data in the data used. The decision tree algorithm was well suited for

their database.

15
(Truong et al., 2020), investigated different models for housing price prediction. Three

different types of Machine Learning methods including Random Forest, XGBoost, and

LightGBM and two techniques in machine learning including Hybrid Regression and Stacked

Generalization Regression were compared and analyzed for optimal solutions. Even though

all of those methods achieved desirable results, they found out that each model has its

advantages and limitations. The Random Forest method has the lowest error on the training

set but is prone to be overfitting. Its time complexity is high since the dataset has to be fit

multiple times. The XGBoost and LightGBM are decent methods when comparing accuracy,

but their time complexities are the best, especially Light GBM. The Hybrid Regression

method is simple but performs a lot better than the three previous methods due to the

generalization. Finally, the Stacked Generalization Regression method has a complicated

architecture, but it is the best choice when accuracy is the top priority. Even though Hybrid

Regression and Stacked Generalization Regression deliver satisfactory results, time

complexity must be taken into consideration since both of them contain Random Forest, a

high time complexity model. Stacked Generalization Regression also has K-fold cross-

validation in its mechanism so it has the worst time complexity.

(Pai & Wang, 2020), employed Genetic algorithms to determine

parameters of machine learning models. Empirical results revealed that attribute

selection for machine learning models in this study does improve performances

forecasting models in forecasting accuracy. Thus, the least squares support vector

regression with genetic algorithms is a feasible and promising machine learning

technique in forecasting real estate prices

(Levantesi & Piscopo, 2020) used random forest to predict house prices in London, and

discovered that despite the dataset size being small, the numerical results show a better

prediction improvement by RF with respect to the traditional regression approach based on

16
Generalized Linear Models. (Warnia & Muhammed, 2020) proposed to use machine learning

and artificial intelligence techniques to develop an algorithm that can predict housing prices

based on certain input features. The business application of this algorithm is that classified

websites can directly use this algorithm to predict prices of new properties that are going to

be listed by taking some input variables and predicting the correct and justified price i.e.,

avoid taking price inputs from customers and thus not letting any error creeping in the

system.

(Sivasankar et al., 2020) compared Random Forest Regression, Decision Tree Regression,

Ridge Regression, LASSO Regression, Ada-Boost Regression, XGBoost Regression

Algorithms, using Scores and Root Mean Square Error(RMSE) and it was found out that the

Decision Tree Regression algorithm has the highest RMSE therefore in that model, it shows

that the Decision tree algorithm can predict more accurately than the other algorithms that

were compared. (Mohd et al., 2020) compared Linear Regression, Decision Tree, Random

Forest, Ridge and Lasso algorithms found that, the best accuracy was provided by the

Random Forest Regressor followed by the Decision Tree Regressor. A similar result is

generated by the Ridge and Linear Regression with a very slight reduction in Lasso. Across

all groups of feature selections, there is no extreme difference between all regardless of

strong or weak groups. It gives a good sign that the buying prices can be solely used for

predicting the selling prices without considering other features to disseminate model over-

fitting. Additionally, a reduction in accuracy is apparent in the very weak features group. The

same pattern of results is visible on the Root Square Mean Error (RMSE) for all feature

selections.(Thamarai et al., 2020) experimented with the most fundamental machine learning

algorithms like decision tree classifier, decision tree regression, and multiple linear

regression. Comparatively the performance of multiple linear regression is found to be better

than the decision tree regression in predicting the house prices.

17
(Priya et al., 2021)considered the most macroeconomic parameters that affect the house

prices variation. In this, they used back propagation neural network (BPN) and radial basis

function neural network (RBF) to establish the nonlinear model for real estate’s price

variation prediction. The dataset was taken from Taipei, Taiwan based on leading and

simultaneous economic indices. They considered 11 parameters. The prediction results

obtained from them are compared to public Cathay House Price Index or the Sinyi Home

Price Index. The two error metrics used were Mean Absolute Error (MAE) and Root Mean

Squared Error (RMSE). When the prediction results were compared to Cathay House Price

Index, RBF Neural Network showed better prediction results than BPN Neural Network.

Similarly, for Sinyi Home Price Index BPN Neural Network showed better prediction results

than RBF Neural Network. Some research articles describe the in depth methods and

procedures to collect the real estate data and their pre-processing techniques.

(Peng et al., 2021) used Support vector regression, Decision tree, Regression-Particle Swarm

Optimization and LUCE algorithms for prediction, LUCE addresses two critical issues of

property valuation; the lack of recent sold prices and the sparsity of house data. Experimental

results show that LUCE consistently outperforms prior automated house valuation methods.

(Ho et al., 2021)used 18-year of housing property data to train models with utilising

stochastic gradient descent based support vector regression, random forest and gradient

boosting machine. They demonstrated that advanced machine learning algorithms can

achieve very accurate prediction of property prices, as evaluated by the performance

metrics. Given the dataset used in the paper, the main conclusion was that Random Forest

and Gradient Boosting Machine are able to generate comparably accurate price estimations

with lower prediction errors, compared with the SVM results.

18
(Dabreo et al., 2021), also stated the importance of an automated system of purchasing a

house. They mentioned how advantageous it would be if buyers do not have to roam around

simply for the purpose of buying a house. Comparing XGBoost, Random Forest, Decision

tree and Linear Regression Algorithms, it was concluded that XGBoost will be the best

machine learning algorithm to be considered.

19
Table 1: Summary of Related Works
SN Author Title Algorithm Result
1 (Jiao, W. 2017) Housing Price Support Vector The experiment result showed that
Prediction Using Regression there is no difference between the
Support Vector performance of feature selection and
Regression. feature extraction
2 (Alfiyatin et al., Modeling House Regression Accuracy was tested using Mean
2017) Price Prediction analysis and Absolute Percentage Error
using Regression Particle Swarm
Regression: 4.84552
Analysis and Optimization
Particle Swarm PSO: 0.73255
Optimization
The result shows that the Regression
algorithm was more accurate.
3 (Lu et al., 2018) A Hybrid Lasso Regression RMSE was used for evaluating
Regression and Ridge accuracy
Technique for algorithms
Ridge: 0.112276
House Prices
Prediction Lasso: 0.113838
Which shows that there is no
significant difference between the two
algorithms
4 (Babu & Literature Multiple It is found that four factors viz. GLV
Chandran, Review on Real Regression, (84%), silver price per gram (92%),
2019) Estate Value Neural Network, population (86%) and cost of crude
Prediction Using Linear oil (88%) have more positive effect
Machine Regression, on land price. The
Learning Support Vector
Regression, k-
Nearest
Neighbours,
Random Forest
Regression
5 (Satish et al., House Price Linear The algorithms were tested using
2019) Prediction Using Regression, accuracy score.
Machine Lasso
Lasso: 76.14994569
Learning Regression,
Gradient Gradient Boosting: 91.27202689
Boosting
Regression Linear Regression: 76.15709644
The result shows that the Gradient

20
Boosting algorithm has the highest
accuracy.
6 (Ahmad & House Price Multiple linear Accuracy was determined using R-
Nawar, 2020) Prediction regression, square. where the score is closer to 1
Lasso the data is more fitted in the model
Regression,
Multiple Linear: 0.6971
Ridge
Regression, Lasso: 0.6953
Random Forest
Regression, Ridge: 0.6966
Artificial Neural Random Forest: 0.8555
Network
ANN: 0.6593
7 (Kuvalekar et House Price SVM, Random The Decision tree regressor provided
al., 2020) Forecasting Forest, Linear high accuracy in predicting house
Using Machine regression, prices. The decision to choose an
Learning Multiple linear algorithm depends largely on the size
regression, and type of data in the data used. The
Decision Tree decision tree algorithm was well
Regressor, suited for the database
KNN,
8 (Truong et al., Housing Price Random Forest, Using RMSE to evaluate the
2020) Prediction via XGBoost, performance. Random Forest:
Improved LightGBM, 0.12980
Machine Hybrid Extreme Gradient Boosting: 0.16118
Learning Regression and LightGBM: 0.16687
Stacked Hybrid Regression: 0.149690
Techniques Stacked Generalization Regression:
Generalization
0.16404
Regression
Random forest is found to perform
best.
9 (Pai & Using Machine Least Squares The algorithms were evaluated using
Wang, Learning Models Support Vector the Mean Absolute Percentage Error
2020) and Actual Regression,
Least Squares Support Vector
Transaction Data Classification
Regression:1.676
for Predicting and Regression
Real Estate Prices Trees, General Classification and Regression Trees:
Regression 2.2944 General Regression Neural
Neural Networks: 22.8936 Backpropagation
Networks, Neural Networks: 15.0357
Backpropagation BNN is good but LSSVR and CRT
Neural
Networks
10 (Levantesi & The Importance Random Forest The Mean Absolute Percentage Error
Piscopo, 2020) of Economic was used to evaluate the performance

21
Variables on of the Random Forest model and it
London Real was 1.68% which shows that is good
Estate Market: A for the system.
Random Forest
Approach
11 (Sivasankar et House Price Random Forest The threshold value of RMSE was
al., 2020) Prediction Regression, set as 0.12. Random Forest: 0.1356
Decision Tree Decision Tree: 0.2048
Regression, Ridge Regression: 0.1179
Ridge LASSO Regression: 0.118
Regression, Ada-Boost: 0.1707
LASSO XGBoost: 0.1135
Regression, The algorithms with RMSE less than
Ada-Boost 0.12 were integrated (Ridge, Lasso,
XGBoost regression)
12 (Mohd et al., Machine learning Linear Random Forest: 0.027
2020) building price Regression,
Decision Tree: 0.053
prediction with Decision
green building Ridge: 0.048
Tree, Random
determinant
Forest, Ridge Linear Regression: 0.048
and Lasso
algorithms Lasso: 0.045
Based on RSME, the Random Forest
has the best performance
13 (Thamarai et House Price decision tree RMSE for Multiple Linear
al., 2020) Prediction classifier, Regression: 2.462792680479472
Modeling Using decision tree RMSE Decision Tree:
Machine regression and 2.57390753524675
Learning multiple linear RMSE of Decision tree classifier:
regression 0.7071067811865476
From the result, the multiple linear
regression has the less value of
RSME which makes it better than
others.
14 (Priya et al., Prediction of Back Machine learning algorithms have
2021) Property Price propagation different performance when used in
and Possibility neural network different datasets
Prediction Using (BPN) and
Machine Radial basis
Learning function neural
network (RBF)
15 (Peng et al., Lifelong Property Support vector Experimental results show that
2021) Price Prediction: regression, LUCE consistently outperforms prior
A Case Study for Decision tree,

22
the Toronto Real Regression- automated house valuation methods
Estate Market Particle Swarm
Optimization
and LUCE
16 (Ho et al., Predicting Support Vector The R squared value was used as the
2021) property prices Regression ,Ran performance metric.
with machine dom Forest and Support Vector Machine:
learning Gradient 0.82715
algorithms Boosting Random Forest: 0.90333
Machine Gradient Boosting Machine:
0.90365. the result shows that the
GBM is better than the other
algorithms
17 (Dabreo et al., Real Estate Price XGBoost, XGBoost :3.06, Random Forest :3.38
2021) Prediction Random Forest, Decision tree: 4.189
Decision tree, Linear Regression: 4.22
Linear In general, the best accuracy was
Regression provided by the XGBoost

2.5 Contribution to Knowledge

When people first think of buying a house they tend to go online and try to study trends.

People do this so they can look for a house which contains everything they need, while doing

this they make note of price which goes with these houses, automating the system of buying a

house is of great importance, many researchers have used a number of machine learning

algorithms to predict the price of houses but most of the researchers repeatedly used the same

algorithms. This system will make use of six (6) machine learning algorithms which from

research have not been compared simultaneously before. The algorithms to be used are:

Linear Regression, Lasso, Ridge, Extra Trees Regressor, Decision Tree Regressor, and

KNeighbors Regressor.

with intentions of selecting the most accurate algorithm with the aid of evaluation metrics.

23
CHAPTER THREE

ANALYSIS AND DESIGN

3.1 System Analysis

System Analysis is the study of a business problem domain to recommend improvements and

specify the business requirements and priorities for the solution. It involves the analyzing and

understanding a problem, then identifying alternative solutions, choosing the best course of

action and then designing the chosen solution. It involves determining how existing systems

work and the problems associated with existing systems. It is worthy to note that before a

new system can be designed, it is necessary to study the system that is to be improved upon

or replaced, if there is any. System analysis is conducted to study a system or its part in other

to identify its objectives. System analysis specifies what the system should do. It involves

collection of data, examination of an already existing solution and building the logical model

of the system.

The research method adopted in this study is System Development Life Cycle. Waterfall

Model is the SDLC approach that was used for the software development. In waterfall model

approach, the whole process of development is divided into separate phases. The outcome of

one phase acts as the input for the next phase serially.

3.1.1 Fact Finding

Fact finding is the formal process of data collection and information about the system. Facts

included in any information system can be tested based on three steps: data set used to create

useful information, process functions to perform the objectives and interface designs to

interact with users. This study gathers facts through the followings methods to analyze the

system expectations.

24
The fact finding techniques that is employed in this study is observation. This study adopts

the secondary method of data collection. The dataset used was obtained from the online

machine learning repository of kaggle. The dataset has eighteen (18) attributes, seventeen

(17) of which are feature attributes and one (1) is the predicted value. Samples of 13,321

houses were collected, after which when cleaned it was left with 6,877 to train the model.

The data is available in a CSV (Comma Separated Values) file that can easily be loaded into

the system for training the model.

3.1.2 Analysis of the Existing System

The existing system of buying and s elling hous es in M akurdi involves m a n u a l

calculation of house prices, and it also involves the house buyer meeting with the owner or

agent one on one. Either buyer or seller may have to travel in other to meet with the other to

make decision.

3.1.3 problems of the existing system

While searching for a house, the buyer is to contact various Estate agents, the problem with

this is: It is time consuming; in that the seller has to manually calculate the price of the house

which may take a reasonable amount of time because of the factors he has to put into

consideration.

Agents need to be paid a fraction of the amount just for searching a house and setting a price

tag for you. In most cases, this price tag is blindly believed by people because they have no

other options. There might be cases that the agents and sellers may have a secret dealing

and the buyer might be sold an overpriced house without his/her knowledge

25
3.1.4 Advantages of proposed system

The purpose of this system is to determine the price of houses in by looking at the various

features which are given as input by the user. These features are given to the Machine

Learning Model and based on how these features affect the label it gives out a prediction.

This was done by first searching for an appropriate dataset that suits the needs of the

developer as well as the user. House prices increase every year, so there is a need for a

system to predict house prices. This system can help the seller determine the selling

price of a house and can help the customer to arrange the right time to purchase a house.

3.2 Modeling the Proposed System

System modelling is the process of developing abstract models of a system, with each model

presenting a different perspective of the system. It is all about representing a system using

graphical notation. Models help the analyst to understand the functionality of the system.

The Unified Modeling Language (UML) is a general-purpose developmental modeling

language in the field of software engineering, which is intended to provide a standard way to

visualize the design of a system. It can be used to model the structures of an application,

behaviors and even business processes. The central idea behind the usage of UML in this

research is to capture the significant details about the system, such that the problem will be

clearly understood, solution architecture can be developed, and a chosen implementation

scheme can be clearly identified and constructed.

3.2.1 Proposed System Architecture

System architecture is the conceptual model that defines the structure, behaviour and

representation of a system. The architecture of the system is shown in the figure below:

26
Figure 1: Architecture of the system
The architecture shows that the dataset is first preprocessed which involve transforming raw

data into an understandable format and check if there are missing values. Processing the

dataset is done to remove rows or columns that have missing values due to mistakes the

might have occurred when entering the data into the CSV file. This is important as it helps

prevent some runtime errors like Not a Number (NaN) error that could prevent the system

from working effectively. The dataset is then normalized which involves rescaling real

27
valued numeric attributes into the range 0 and 1. The dataset is then divided into training and

testing datasets.

3.2.2 Use Case Diagram

A use case diagram is a representation of a user’s interaction with the system that shows the

relationship between the user and the different use cases in which the user is involved. Use

case diagrams are a way to capture the system’s functionality and requirements in UML

diagrams. It captures the dynamic behavior of live system. A use case diagram consists of a

use case and an actor. The systems use case diagram is shown below.

Figure 2: Use case diagram of the system

28
3.3 System Design

System design is the specification or construction of a technical, computer-based solution for

the business requirements identified in a system analysis. It gives the overall plan or model of

a system consisting of all specifications that give the system its form and structure i.e. the

structural implementation of the system analysis.

The selected architectural design defines all the components that needs to be developed,

communications with third party services, user flows and database communications as well as

front-end representations and behavior of each component. The design is usually kept in the

Design Specification Document.

3.3.1 Input Design

Input Design is the process of converting a user oriented description of the into a computer

based system. Input design facilitates the entry of data into the computer system. In other for

the proposed system to perform predictions, it requires specific features of the house to be

supplied as shown in the table below.

Table 2: Input Design

S/N Field name Data type Description

1 location object House Location

2 total_sqft object Size of the land where the system is built

3 LandSlope object Slope of property

4 Housestyle object Style of dwelling

5 Age int How long the house has been in existence

6 RoofStyle object Type of roof

7 RoofMatl object Roof material

8 Exterior1 object Exterior covering on house

29
9 Watersrc object The source of water in the house

10 Electricity object Availability of Electricity

11 bath float Number of Bathrooms in the House

12 bhk object Number of Bedrooms in the House

13 Garage object Presence of a Garage

14 Fence object Availability of a fence

15 price float Price of the House

16 area_type object Development level of the area

17 Exterior2 object Exterior covering of the building if more


than one

3.3.2 Output Design

A system must have one or more outputs. The output from a system is the justification for its

design. The objectives a system is said to be achieved when the output of the system is

efficient and accurate. The output of the system is the predicted outcome of the house

features. The price will be displayed on the screen.

Table 3: Output Design

S/no Field name Data type Description


1 price float The predicted price

3.4 Program Design

Program design is the process of translating system requirements into a program that can be

executed on a computer system. The program was designed using Linear regression machine

learning algorithm;

4.1 Program architecture

Program architecture refers to the fundamental structures of a software system. A class

diagram is a structure diagram that describes the structure of a system by showing the

30
systems classes, their attributes, methods and the relationships among the objects. The class

diagram for the proposed system is shown below.

Figure 3: Class diagram of the proposed system

3.4.2 Program flowchart

An algorithm is a step by step procedure, which defines a set of instruction to be execute in a

certain order to get the desired output. The flowchart below represents the algorithm used in

the study.

31
Figure 4: Flowchart for the proposed System

The dataset used to train the model was loaded to the system. The dataset is divided into

training the dataset which is 80% of the total dataset and testing dataset which is 20% of the

total dataset. The model is training involves identifying patterns in the data and creating a

function that attempts to map input features to the output feature. The remaining testing data

32
is used to determine how well the model has learnt the training dataset and can carry out

prediction. The inputs supplied by the user first has to be arranged into a numpy array before

it is passed to the model, then the model uses the function that was created at the point of

learning to map the inputs given to the model to an output in the solution space and

sometimes even outside it depending on the values of the input supplied to it. This is what is

finally outputted from the system.

3.5 Description of Modules

A module is a software component or part of a program that contains one or more routines.

The modules used in this study are explained in the table below.

Table 4: Modules used in the system

SN Module Description
1 Pickle It is use for saving a programs state data on the disk
where it is left off when restarted. It is used for
minimizing the execution time
2 Pandas It is used to load and save dataset
3 Numpy It is the core library for scientific computing in python
4 Train_Test_Split It splits arrays or matrices into random train and test
subsets
5 Accuracy_Score It is used to determine how accurate the classification is.
6 MinMax Scaler The ranges of values in the dataset are scaled down to
between 0 to 1. It is used to prevent one feature from
dominating other features.

3.6 Choice of Programming Tools

The programming tools that were used in implementing this project are HTML, CSS,

PyCharm, Python and Jupyter notebook.

a) Hypertext Markup Language: HTML is the standard markup language for

documents designed to be displayed in a web browser. HTML was used to ensure the

proper formatting of text and images so that the web browser will display them as

they are intended to look and it is supported by web browsers.

33
b) Cascading Style Sheets: CSS is a style sheet language used for describing the

presentation of a document written in a markup language like HTML. CSS was used

to control how the web pages look. CSS was used to control the fonts, texts, colors,

backgrounds, margins and layout.

c) Python: python is a high level, interpreted and general purpose dynamic

programming language. The syntax in python helps the programmers code in fewer

steps as compared to other programming languages. It is used in to carry out this

study because; it supports web applications, it has many in built functions, it has

extensive support libraries and it has user friendly data structures.

d) PyCharm: PyCharm is a dedicated python Integrated Development Environment

providing a wide range of essential tools for python developers, integrated to create a

convenient environment for productive python, web and data science development. It

was used for the HTML, CSS and scripting.

e) Jupyter Notebook: Jupyter notebook is an open source web application that allows

data scientists to create and share documents that integrate live code, equations,

computational output, visualizations, and other multimedia resources, along with

explanatory text in a single document. In this study Jupyter was used to train the

model.

34
CHAPTER FOUR

IMPLEMENTATION AND RESULT

4.1 Implementation

Implementation involves testing the system to verify if it meets the stated aim and

objectives. It also involves training users to handle the system and plan for a smooth

conversion.

Once the data was cleanend and insights was gained about the dataset, appropriate machine

learning model that fits our dataset was applied. Six regression algorithms were selected

to predict the dependent variable in the dataset. The algorithms that were selected are

basically used as regressors t h a t c a n b e t r a i n e d to predict the continuous values. The

six algorithms are Linear Regression, Ridge Regression, Lasso Regression, Extra Trees

Regression, KNeighbors Regression and Decision Tree. These algorithms were

implemented with the help of python’s SciKit-learn Library. The predicted outputs

obtained from these algorithms were saved in comma separated value file. This file was

generated by the code at run time.

4.2 Program Testing

After the design and coding of an application, it is imperative to run a test to ascertain that the

actual results match the expected results. There are different testing types but for the purpose

of this study, the System testing was adopted. The purpose of the system testing is to identify

and correct errors in the system.

When the code gets executed and the model is trained, an interface is created and

connected to the trained model. Such that a user can input features of the desired house,

output is then gotten as a result of the computation from the model.

35
Test cases were carried out to verify some functionalities of the software. Each test case has

its objective and expected outcome which is represented in the table below.

Table 5: Test Case 1 (Evaluation of Algorithms)


1 HPPS1 To evaluate the performanc of The system should evaluate the
the six algorithms using performance of the algorithms to
regression performance show the best to be chosen for the
metrics. model.

Table 6: Test Case 2 (Testing Software Workability)


SN Test Cases Test Objectives Expected Outcomes
1 HPPS2 To test if the system can The system should output the
predict house prices. predicted price of the house.
2 HPPS3 To ascertain if a change in Change in feature selection
feature selection affects the should increase or decrease the
house price. price.
3 HPPS3b To check if a change in The house price should change
location will affect the price when the location is changed
of a house. from one location to the other.
a. Barracks Rd
b. GRA
c. Mechanic Village

4 HPPS3c To check if a change in The change in number of


number of bedrooms will bedrooms should affect the house
affect the price of a house. price.
a. 3 bedrooms
b. 4 bedrooms
c. 6 bedrooms
5 HPPS3d To test if changing the The price of the house should
housestyle will also change change according to the style of
the price of a house. the house.
a. Penthouse
b. Bungalow
c. Duplex

36
4.3 Results
The software was executed as specified in the table 5 and table 6. The outputs were evaluated

to determine the performance of the software. The results for the first test HPPS1 is shown in

the table 7.

Four performance metrics; R-squared value, Root Mean Squared Error (RMSE), Mean

Absolute Error (MAE) and Mean Squared Error (MSE) were taken into consideration to

evaluate the performance of the six algorithms.

The R squared value determines the proportion of variance in the dependent variables that

can be explained by the independent variable. The higher the value of the R square the

better the model.

Root Mean Square Error determines how well a regression model fits a dataset, the smaller

the value of the RSME the better the model.

The Mean Absolute Error tells the absolute value of the difference between the predicted

value and the actual value. A low value of the MAE is more appreciated than higher

values.

The Mean Squared Error is also used to determine how close predictions are to actual

values, the lower the MSE the better the model (Umar, 2020).

Table 7: Results for Test Case 1


Algorithms MSE RSME MAE R Square
Linear Regression 3.12 558e91.5 210e7.07 -7.08

Lasso Regression 25181.60 158.69 57.41 0.43

Ridge Regression 25365.49 159.27 59.12 0.42

Extra Trees Regression 16233.39 128.71 49.61 0.63

37
Decision Tree Regression 21083.89 136.44 57.53 0.52

KNeighbors Regression 45763.30 213.92 99.22 -0.04

From the above table it is clear that the Extra Tress Regression gives a higher accuracy and

R-squared value and low error values

In the table 6 the test was to ascertain the workability of the software; to determine if the

software can actually predict a house price and also to determine if a change in a feature can

alter the price. The results are shown in table 8.

Table 8: Result for Test Case 2


SN Test Test Objectives Results References
Cases
1 HPPS2 To test if the system can The system predicted the Figure 5
predict house prices. house price and
displayed it
2 HPPS3 To ascertain if a change in Every change in a Figure 6
feature selection affects feature affected the
the house price. house price
3 HPPS3b To check if a change in When the locations were
location will affect the changed the house price
price of a house. was also affected.
a. Barracks Rd a. Figure 7
b. GRA b. Figure 8
c. Mechanic Village c. Figure 9
4 HPPS3c To check if a change in The change in number of
number of bedrooms will bedrooms changed the
affect the price of a house. house price. a. Figure 10
a. 3 bedrooms b. Figure 11
b. 4 bedrooms c. Figure 12
c. 6 bedrooms
5 HPPS3d To test if changing the The price of the house
housestyle will also changed according to the
change the price of a style that was selected.
house.
a. Penthouse a. Figure 13
b. Bungalow b. Figure 14
c. Duplex c. Figure 15

38
Figure 5: Prediction of House Price

Figure 6: Change in House Price due to Change in Selected Features

39
House prices according to change in Locations

Figure 7: Change in location to Barracks Rd affecting house price

Figure 8: Change in location to G R A affecting house price

40
Figure 9: Change in location to Mechanic Village affecting house price

Change in House Price according to change in number of Bedrooms

Figure 10: Change in price for three Bedrooms

41
Figure 11: Change in price for four bedrooms

Figure 12: Change price for Six bedrooms

42
House Style affecting House Price

Figure 13: Penthouse Effect on House Price

Figure 14: Effect of Bungalow on House Price

43
Figure 15: Effects of a Duplex on House Price

4.4 Discussion of Results

Based on the results gotten from the tests carried out, the system was able to come out with

the expected outcomes.

4.4.1 Results from Test Case 1

After comparing the models, it is found that Extra Trees Regression works best with highest

accuracy which has the lowest value for the MSE next to the value of Linear regression and

has the lowest value of RSME, MAE and the highest value of R square.

Next to the Extra Trees Regression is the Decision tree regression, which also has a relatively

low MSE, RSME, MAE, and relatively high R square value. The Lasso and Ridge Regression

almost gave the same results that were not good enough compared to Extra Trees and

Decision tree. Both R squares are less than 50% and according to the measurement they are

not closely correlated with some of the data.

Linear Regression and KNeighbors Regression had a very poor performance, though the

Linear Regression model has the best MSE it still does not measure up to standard compared

44
to the other models. Aside the MSE, KNeighbors still performed better than the Linear

Regression model. This means that the Linear Regression model is the worst model according

to the data used in this study.

4.4.2 Results from Test Case 2

Considering the result gotten from HPPS2, the system is able to predict house prices and

display them for the user to view. This proves that the system is working and has achieved

the aim for which it was trained.

In HPPS3 the aim of the test was to test if the change in feature selection alters the price

depending on the type of features that were selected. This shows that the system doesn’t only

predict price but it predicts house price according to the data it was trained with.

In HPPS3b, location as a feature was tested to determine if for each location there is a

change in house price. Barracks Rd, G R A and Mechanic Village were tested, and for each

location that was selected all other features were left unchanged in order to be certain that

the change is not as a result of another feature. it was found that the house price in G R A

was higher than the house price in Barracks Rd while the house price in Mechanic village

was less than the other locations. This shows that the same house styles with the same

features can have different prices in different locations.

Considering results from HPPS3c, the number of bedrooms were tested and the results found

showed that the number of bedrooms also affects the house price. The test was carried out

for 3 bedrooms, 4 bedrooms, and 6 bedrooms, for each number of bedroom that was selected

all other features were not changed in order to be sure that the change is not as a result of

other features. it was found that the house price for 6 bedrooms was higher than the house

price for 4 bedrooms while the house price for 3 bedrooms was less than the other number of

45
bedrooms. This proves that the same house with the same features in the same location can

have different prices when the number of bedrooms are increased or decreased.

HPPS3d was a test for the house style; to test if for every house style there is a change in

house price. For the test, a penthouse, duplex and Bungalow were chosen to test the price

effect of House style. The results show that the price of a penthouse is greater than the price

of a duplex while the price of a bungalow is lesser than the price of a duplex.

From all the results gotten from the test cases, it does not only prove that the system is

predicting house price but it also shows that all the features have effects on the house price

and the system, in most cases predict the prices in order quantity, i.e for the number of

bedrooms, when the number of rooms is increased the price tend to increase and when the

number is reduced the house price tend to reduce too. In order of most developed locations,

i.e for location, the system predicts a higher price when the location is more developed. In

house style a penthouse has a greater price, while in real life situations, it is also costlier than

other house styles, the Bungalow is lesser which is so in real life too.

Considering all the tests results from HPPS1 to HPPS3d, it is clear that the system is in good

shape and can be used for the proposed purpose.

4.5 System Requirement

System requirements are the configuration that a system must have in order for the software

to run efficiently. Failure to meet these requirements can result to performance problems.

System requirements can be considered in terms of hardware and software.

The Hardware Requirements are:


1. 64bits PC
2. Hard drive of at least 500MB free space
Software requirement:
1. Web browser

46
2. Python flask
3. Pycharm Ide
4. Python interpreter
4.6 User Documentation

A user documentation is used to assist the users by providing them with clear and

comprehensible information about the software. For the house price prediction system, the

steps are:

 Launch the Pycharm ide


 Run the main.py python file
 Click on the url https://127.0.0.1:5001/ , this url will run on a web browser displaying
the user interface
 Input the house features and click of predict price

4.7 System Change Over

In this system the appropriate change over method to use is he Parallel Running. In the

parallel running change over; the new system is started, but the old system is kept running in

parallel for a while. All of the information that is collected into the old system is also

collected into the new system. Eventually the old system fades away but only when the new

system has been proven to work. This is advantageous because if there comes a breakdown at

any time in the new system, then the old system will act as backup and also the output from

the old system can be compared with the outputs in the new system to ascertain its accuracy.

It will be much easier to start by running both the manual price prediction and the automated

price prediction because many house buyer or real estate may at first find it challenging to

wholly give in to the new system. And other house buyers who are computer illiterates may

find it challenging to input the features.

47
CHAPTER FIVE

RECOMMENDATION AND CONCLUSION

5.1 CONCLUSION

Improvement in computing technology has made it possible to examine information that

cannot previously be captured, processed and analysed. New analytical techniques of

machine learning can be used. This study is an exploratory attempt to use six machine

learning algorithms in predicting house prices, and then compare their results.

The study shows that machine learning algorithms can achieve accurate prediction of house

prices, as evaluated by the performance metrics. Given the dataset used in this study, the

conclusion is that Extra Trees Regression and D ecis ion Tree are able to generate

accurate price predictions with lower prediction errors, compared with the Linear

Regression, Lasso Regression, KNeighbor Regression and Ridge Regression results.

The study has shown that machine learning algorithms, are tools important for property

researchers to use in housing price predictions. However, these machine learning tools also

have limitations.

The choice of algorithm depends on consideration of a number of factors such as the size

of the data set, computing power of the equipment, and the availability of waiting time for

the results.

To conclude, Machine learning is very useful for finding the relation between the attributes

and building the model according to the relation that attributes contain. By using regression

algorithm which is part of machine learning the house price prediction can be done. House

price prediction helps the customer to buy its dream house among the different price

variation, attributes and needs. Algorithm find relation among the training data and the result

48
is applied on test data which will be users input. According to attributes specified the plans

gets provided.

5.2 RECOMMENDATION

Buying your own house is what every human wish for. Therefore, using this system, people

can buy houses and real estate at their rightful prices and ensure that they don't get tricked by

sketchy agents.

This system is recommended for usage in Real Estate because it will aid in giving accurate

predictions for them to set the pricing and save them from a lot of hassle and save time. The

system is apt enough in training itself and in predicting the prices from the raw data provided

to it.

Whenever large dataset is involved and there is much categorical data it is recommended that

Trees Regressors be used for modelling rather than other algorithms.

49
REFERENCES
Aditya, A., (2017). Housing wealth and consumption, Vol. 107, No. 17. pp. 3415-3446.
Zakaria, I.A (2020). How to redefine the Concept of House within the Architectural
Modernity Framework. Housing Architecture Research.

50
APPENDICES
Code for training the model
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
data=pd.read_csv('C:/ML/mkdhouses.csv')
data.drop(columns=['availability', 'society', 'balcony','Utilities'],inplace=True)
data['location'].value_counts()
def convertRange(x):

temp = x.split('-')
if len(temp) == 2:
return (float(temp[0]) + float(temp[1]))/2
try:
return float(x)
except:
return None
data['location'] = data['location'].apply(lambda x: x.strip())
location_count= data['location'].value_counts()
data['location']=data['location'].apply(lambda x: 'other' if x in location_count_less_10 else x)
data = data[((data['total_sqft']/data['bhk']) >= 300)]
data.describe()
def remove_outliers_sqft(df):
df_output = pd.DataFrame()
for key,subdf in df.groupby('location'):
m = np.mean(subdf.price_per_sqft)

st = np.std(subdf.price_per_sqft)

gen_df = subdf[(subdf.price_per_sqft > (m-st)) & (subdf.price_per_sqft <= (m+st))]


df_output = pd.concat([df_output,gen_df],ignore_index =True)
return df_output
df1 = remove_outliers_sqft(data)
df1.describe()
def bhk_outlier_remover(df):
exclude_indices = np.array([])
for location, location_df in df.groupby('location'):
bhk_stats ={}
for bhk, bhk_df in location_df.groupby('bhk'):
bhk_stats[bhk] = {
'mean': np.mean(bhk_df.price_per_sqft),
'std': np.std(bhk_df.price_per_sqft),
'count': bhk_df.shape[0]
}

51
for bhk, bhk_df in location_df.groupby('bhk'):
stats = bhk_stats.get(bhk-1)
if stats and stats['count']>5:
exclude_indices = np.append(exclude_indices,
bhk_df[bhk_df.price_per_sqft<(stats['mean'])].index.values)
return df.drop(exclude_indices,axis='index')
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression,Lasso,Ridge
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import make_column_transformer
from sklearn.pipeline import make_pipeline
from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_error,mean_absolute_error
import math
print("LinearRegression: ", mean_absolute_error(y_test, y_pred_lr))
print("Lasso: ", mean_absolute_error(y_test, y_pred_lasso))
print("Ridge: ", mean_absolute_error(y_test, y_pred_ridge))
print("ExtraTreesRegressor: ", mean_absolute_error(y_test, y_pred_extratreesregressor))
print("DecisionTreeRegressor: ", mean_absolute_error(y_test, y_pred_decisiontreeregressor))
print("KNeighborsRegressor: ", mean_absolute_error(y_test, y_pred_kneighborsregressor))

HTML Code

52

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy