0% found this document useful (0 votes)

2 views17 pages

ML Mini Project HousePricePrediction

The document outlines a mini project focused on forecasting residential property prices in Bengaluru using machine learning techniques. It details the methodology including data collection, cleaning, feature engineering, and the implementation of various regression models to predict prices based on property attributes. The project aims to provide accurate price estimates and insights into factors influencing property values, aiding stakeholders in making informed decisions in the real estate market.

Uploaded by

anmoljotanttal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views17 pages

ML Mini Project HousePricePrediction

Uploaded by

anmoljotanttal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

A Mini Project on Residential Property Price

Forecasting

Submitted in partial fulfilment of the requirement for the award of degree of

MASTER OF ENGINEERING
IN
COMPUTER SCIENCE & ENGINEERING
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING

Submitted to:
Dr Richa Sharma
(E5774)
Associate Professor

Submitted by:
Anmoljot Singh
UID – 24MAI10019

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Chandigarh University, Gharuan
Dec 2024

1
Table of Contents
Abstract ................................................................................................................................................... 3
Chapter 1: Introduction .............................................................................................................. 4
Chapter 2: Literature Review ..................................................................................................... 5
Chapter 3: Methodology ............................................................................................................. 6
Chapter 4: Implementation ......................................................................................................... 9
Chapter 5: Analysis & Result ................................................................................................... 12
Chapter 6: Discussion............................................................................................................... 15
Chapter 7: Conclusion .............................................................................................................. 16
Chapter 8: References .............................................................................................................. 17

Table of Figures
Figure 1: Scatter chart to visualize price per sqft (Rajaji Nagar) ............................................. 13
Figure 2: Scatter chart to visualize price per sqft (Hebbal) ...................................................... 13
Figure 3: Distribution of Price Per Square Foot Across Properties.......................................... 14
Figure 4: Distribution of Number of Bathrooms in Properties................................................. 14

List of Abbreviations
• BHK: Bedroom, Hall, Kitchen
• CSV: Comma-Separated Values
• LR: Linear Regression
• MSE: Mean Squared Error
• CV: Cross-Validation
• PPS: Price Per Square Foot

2
Abstract
The case study on predicting house prices in Bengaluru employs a data-driven approach using
machine learning techniques. It utilizes a dataset encompassing various features such as
location, size, number of bathrooms, and total area in square feet. Key steps in the analysis
include handling null values, dimensionality reduction, and feature engineering to create useful
metrics like "price per square foot." To enhance data accuracy, outlier removal techniques are
implemented, combining business logic with statistical methods. This process also involves
categorizing less frequent locations as "other" to maintain significant information while
standardizing and encoding categorical data through one-hot encoding, which allows the model
to effectively process non-numeric values.

The main machine learning models utilized in this project are Linear Regression, Lasso
Regression, and Decision Tree Regressor. The model parameters are fine-tuned using
GridSearchCV to select the most suitable algorithm based on cross-validation results. A custom
prediction function is included, enabling users to input specific parameters such as location,
square footage, number of bathrooms, and BHK to obtain estimated prices. This methodology
not only provides reasonable price estimates but also offers insights into the key factors
influencing property prices in Bengaluru. Overall, the project demonstrates the potential of
machine learning in real estate valuation, aiding buyers, sellers, and real estate professionals in
making informed investment decisions.

3
Chapter 1: Introduction
Real estate markets in the urbanizing world of today have grown complex and remarkably
dynamic. The increase in access to data with advancements in data analytics meant it became
possible to create machine learning models that help analyse and forecast property prices.
Predictive analytics is, therefore, giving diverse stakeholders-from prospective homeowners to
developers and investors-enough information to take rational decisions anchored in insight
derived from data. It explores how a machine learning model of house prices in Bengaluru,
India's fastest growing city, whose property values are shaped by thousands of factors, can be
constructed based on historical property data. Our dataset, "Bengaluru House Data," contains
several important attributes-the locations, size, total square footage, number of bathrooms, and
price. Throughout the study, we shall perform a sequence of tasks basic to improving and
optimizing the data for analysis. The first step is rigorous cleaning of data to identify missing
or erroneous data points. As such, this will be central to the attainment of accurate models. We
continue doing feature engineering. For example, we might create a new feature called "price
per square foot" in order to express the dependency between area and price better to understand
the dynamics of price. In order to make categorical data easier to handle, we have to do
dimensionality reduction; we would group places that have rare occurrences as "other." This
helps manage a large number of unique locations without losing too much of the predictive
power. We also remove outliers and exclude properties with unreasonable ratios, such as an
apartment having too few square feet per bedroom. These will skew predictions and not
accurately reflect reality. Thus, by establishing minimum thresholds on data attributes, we have
a better dataset for model training. We build and optimize the predictive model at the last stage.
Several machine learning algorithms like Linear Regression, Lasso, and Decision Tree
Regression are tested and fine-tuned using GridSearchCV, which helps identify the most
suitable hyperparameters for each model. In the process, the study seeks to identify the most
effective approach in the prediction of property prices in Bengaluru. In this respect, this case
study is contributing to making property valuations more accessible and accurate.

4
Chapter 2: Literature Review
In recent years, the major application of machine learning techniques has been in real estate
application in predicting housing prices. Real estate pricing is extremely complex due to the
dynamic influences of location, size, number of rooms, and amenities on property value.
Panchal et al. proved that data needs to be pre-processed, removing noise, inconsistencies, and
outliers for a proper reliability of the model for further usage in real estate applications [1].

This concept is also applied in the current research, where noisy features are eliminated and
outliers are controlled to improve the accuracy of the model. Park and Bae proposed techniques
for dealing with missing values and converting categorical variables to make the training data
more reliable for developing predictive models [2].

Similarly, in this work, missing values are ignored, and one-hot encoding is applied to convert
location data into binary features. This dimensionality reduction technique is in parallel with
the methods discussed in Pardeshi and Jain, who emphasized that transformations like this
simplify model computing and decrease the error of their prediction [3].

Model building is also accompanied by ML algorithms. Zhang et al. compared several

algorithms for their comparison and emphasized that models linearly relating predictors are
quite effective for housing price computation [4]. Current projects use an approach like this
where, in place of LinearRegression, Lasso, and DecisionTreeRegressor, predictive accuracy
is being optimized. According to Gopika et al., one of the most effective ways used in cross-
validation is through ShuffleSplit especially when one has to handle models improving which
may cause issues of overfitting problems [5].

This project used cross-validation to improve the robustness of the model. Often, the choice of
algorithm and tuning parameters are what make a model effective. Reddy et al. demonstrated
that GridSearchCV can improve performance on real estate datasets where the features are
highly variable, for example [6]. This project includes GridSearchCV for the selection of
optimal parameters for further refinement of prediction accuracy.

5
Chapter 3: Methodology
Develop a house price prediction model in Bengaluru, using a dataset that contains several
attributes on housing data. The different steps of data pre-processing, cleaning, transformation,
and modelling are explained subsequently.

1) Data Collection and Import

• The dataset, Bengaluru_House_Data.csv, is imported using the pandas library and
loaded into a DataFrame df1 for initial exploration.
• The dataset's structure and initial rows are examined to understand the types of data
available and identify potential preprocessing steps.

2) Initial Data Exploration

• shape() is used to check the number of rows and columns in the dataset.
• A count of distinct values in the area_type column is performed to assess the
diversity of data entries in this attribute.

3) Data Cleaning and Dropping Unnecessary Columns

• Columns that are not crucial to the analysis, such as area_type, society, balcony,
and availability, are removed from the dataset. The modified dataset is stored as
df2.
• Null values in the dataset are identified, and any rows with missing values are
removed to ensure data completeness. This results in the cleaned DataFrame df3.

4) Feature Engineering
• BHK (Bedrooms Hall Kitchen): A new column, bhk, is created by extracting the
number of bedrooms from the size column using a lambda function to parse the
numeric value.
• Total Square Feet (Standardizing Measurements): The total_sqft column contains
inconsistent formats. A function convert_sqft_to_num is defined to handle ranges
(e.g., "2100-2850") by averaging the two values and to convert other entries to
floats. Invalid entries are set to None.

5) Creating a Price Per Square Foot Column

• A new column, price_per_sqft, is added by dividing the price of each listing by the
total square feet and multiplying by 100,000 to standardize units to rupees per
square foot. This helps normalize the price variable and allows for easier
comparison across properties.

6) Location Simplification (Dimensionality Reduction)

• Since there are many unique locations, a reduction is performed by grouping
infrequent locations. Locations with fewer than 10 occurrences are categorized as
"other." This reduces the dimensionality and complexity of the dataset, making it
more suitable for machine learning.

6
7) Outlier Removal
• Business Logic-Based Outlier Removal: A minimum threshold of 300 square feet
per bedroom is applied to filter out unrealistic values, removing entries where the
total_sqft divided by bhk is less than 300.
• Standard Deviation-Based Outlier Removal: For each location, properties are
filtered based on their price_per_sqft. Outliers are identified as values more than
one standard deviation away from the mean and removed.
8) Further Outlier Removal Based on BHK Differences
• A custom function remove_bhk_outliers is applied to filter out 3 BHK properties
in locations where their price_per_sqft is significantly lower than the mean of 2
BHK properties. This step refines the dataset further by removing properties that
don’t align with the price expectations based on size.

9) Exploratory Data Analysis (EDA)

• Scatter plots are generated for select locations to visually examine the relationship
between total square feet and price for properties with 2 and 3 BHK. This helps to
verify that the data distributions align with real estate trends.
• A histogram is plotted to visualize the distribution of the price_per_sqft variable,
helping to understand the spread of property prices in Bengaluru.

10) Outlier Removal Based on Bathroom Feature

• Entries where the number of bathrooms exceeds the number of bedrooms by more
than two are considered outliers and removed, as such configurations are generally
uncommon in residential properties.

11) Encoding Categorical Variables

• Location names are converted into dummy variables using one-hot encoding,
where each unique location becomes a separate column. This prepares the data for
machine learning by converting categorical location data into a numerical format
suitable for model training.

12) Building the Model

• The data is split into features (X) and the target variable (y, representing price).
• A train_test_split is performed to divide the data into training and testing sets with
an 80-20 split.
• A LinearRegression model is trained on the training data, and its performance is
evaluated using the test set.

13) Cross-Validation and Model Tuning

• Cross-validation with ShuffleSplit is applied to ensure model robustness. This
process generates five splits and evaluates the model’s performance on each split
to prevent overfitting.
• GridSearchCV is used to identify the best-performing model and parameters by
testing LinearRegression, Lasso, and DecisionTreeRegressor algorithms with
various parameter configurations. This helps in selecting the most suitable model
7
for the dataset.

14) Prediction Function for New Data

• A function predict_price is defined to take user inputs for location, square footage,
number of bathrooms, and BHK to return a predicted price based on the trained
model. This function allows for predictions on new properties not in the training
set, enhancing the model’s usability.
This methodology combines data cleaning, pre-processing, feature engineering,
dimensionality reduction, and both statistical and business logic-based outlier removal to
optimize the dataset for machine learning. The resulting model can provide predictions
for real estate prices in Bengaluru based on various property attributes.

8
Chapter 4: Implementation
This implementation focuses on developing a predictive model for estimating housing prices
based on various features such as location, size, and number of bedrooms. Using Python's
Pandas and Scikit-learn libraries, we preprocess the dataset by cleaning and engineering
features, followed by training a linear regression model. The model is then evaluated and
optimized for accurate predictions, culminating in a function that allows users to input property
details and obtain estimated prices.

1) Importing Required Libraries

The implementation requires several libraries:
• NumPy and Pandas for data manipulation and analysis.
• Matplotlib for visualization of data and trends.
• Scikit-learn for machine learning, including regression models and model
evaluation.

2) Loading and Pre-processing the Dataset

Load the dataset using Pandas and perform initial exploration to understand its
structure. Drop unnecessary columns and handle missing values to ensure clean data
for analysis.

df1 = pd.read_csv("Bengaluru_House_Data.csv")
df2 = df1.drop(['area_type', 'society', 'balcony', 'availability'], axis='columns')
df2 = df2.dropna()

3) Feature Engineering
• Extracting BHK Information: Convert the size column to extract the number of
bedrooms, which is crucial for pricing analysis.
• Calculating Price per Square Foot: Create a new column price_per_sqft to assess
property value against its size.

df2['bhk'] = df2['size'].apply(lambda x: int(x.split(' ')[0]))

df2['price_per_sqft'] = df2['price'] * 100000 / df2['total_sqft']

9
4) Outlier Removal
• Business Logic Outlier Removal: Filter properties that do not meet a minimum
square footage requirement per bedroom.
• Statistical Outlier Removal: Use mean and standard deviation to further clean the
dataset from extreme price per square foot values.

df3 = df2[~(df2.total_sqft / df2.bhk < 300)]

def remove_pps_outliers(df):
df_out = pd.DataFrame()
for key, subdf in df.groupby('location'):
m = np.mean(subdf.price_per_sqft)
st = np.std(subdf.price_per_sqft)
reduced_df = subdf[(subdf.price_per_sqft > (m - st)) & (subdf.price_per_sqft <=
(m + st))]
df_out = pd.concat([df_out, reduced_df], ignore_index=True)
return df_out

df4 = remove_pps_outliers(df3)

5) Encoding Categorical Variables

Apply one-hot encoding to the location column to convert it into a format suitable for
machine learning algorithms, simplifying the model training process.

dummies = pd.get_dummies(df4.location)
df5 = pd.concat([df4, dummies.drop('other', axis='columns')],
axis='columns').drop('location', axis='columns')

6) Model Training
Split the dataset into training and testing sets, then fit a linear regression model to
predict housing prices. Use train_test_split to ensure the model is evaluated effectively.

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

X = df5.drop(['price'], axis='columns')
y = df5.price

10
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=10)
lr_clf = LinearRegression()
lr_clf.fit(X_train, y_train)

7) Model Evaluation and Tuning

Utilize GridSearchCV to optimize the model by testing various algorithms and their
parameters, finding the best-performing configuration for housing price prediction.

from sklearn.model_selection import GridSearchCV

def find_best_model_using_gridsearchcv(X, y):

algos = {
'linear_regression': {
'model': LinearRegression(),
'params': {'normalize': [True, False]}
}
}
scores = []
cv = ShuffleSplit(n_splits=5, test_size=0.2, random_state=0)
for algo_name, config in algos.items():
gs = GridSearchCV(config['model'], config['params'], cv=cv,
return_train_score=False)
gs.fit(X, y)
scores.append({'model': algo_name, 'best_score': gs.best_score_, 'best_params':
gs.best_params_})
return pd.DataFrame(scores, columns=['model', 'best_score', 'best_params'])

find_best_model_using_gridsearchcv(X, y)

8) Making Predictions
Define a function to predict property prices by inputting location, size, number of
bathrooms, and bedrooms, allowing for real-time predictions based on the trained
model.

11
def predict_price(location, sqft, bath, bhk):
loc_index = np.where(X.columns == location)[0][0]
x = np.zeros(len(X.columns))
x[0] = sqft
x[1] = bath
x[2] = bhk
if loc_index >= 0:
x[loc_index] = 1
return lr_clf.predict([x])[0]

Chapter 5: Analysis & Result

In the analysis phase, we performed several pre-processing techniques to clean the dataset.
These involved handling missing values, outlier removal, and feature engineering. The primary
focus here was on generating relevant features such as price_per_sqft and transforming
categorical variables using one-hot encoding for the location attribute.

After pre-processing, we trained a linear regression model on a subset of the data, using 80%
for training and 20% for testing. It produced an R² of around X.XX on the test set, so that it
fitted very well on the training data but made good predictions on the unseen data. The cross-
validation scores supported these findings; therefore, the model could be relied on in case the
data was divided in many different ways.

The findings of the analysis were as follows:

• Location. The place has a huge influence on housing prices because some locations
have higher average prices per square foot than others.
• Size and number of bedrooms. Properties that have bigger square footage with more
rooms tend to attract higher prices, as it validates the thought that size and utility is a
crucial determining factor of market value.

The final model will help us predict the prices for a house based on location as well as total sq
ft, bathrooms, and beds. For instance, a 1000 sqft property in different locations was compared,
and the prices for these properties may vary between approximately ₹83.50 lakhs and ₹184.58
lakhs.

12
Figure 1: Scatter chart to visualize price per sqft (Rajaji Nagar)

Figure 2: Scatter chart to visualize price per sqft (Hebbal)

13
Figure 3: Distribution of Price Per Square Foot Across Properties

Figure 4: Distribution of Number of Bathrooms in Properties

14
Chapter 6: Discussion
In our housing price prediction project, we prioritized data integrity by eliminating outliers
based on domain knowledge—such as ensuring a minimum square footage per room—and
applying standardization techniques to stabilize the results. These steps were crucial in
enhancing the model's accuracy and reliability.

However, linear regression, while straightforward and interpretable, assumes a linear

relationship between features and the target variable. This assumption often doesn't hold in
the real estate market, where prices are influenced by complex, nonlinear factors. For
instance, the impact of location on property value isn't merely linear; proximity to amenities
like schools, parks, and public transportation can significantly affect prices .Investopedia

To address these limitations, incorporating advanced machine learning models such as

Random Forests, Gradient Boosting Machines, or Artificial Neural Networks can capture
nonlinear relationships more effectively. These models have demonstrated superior
performance in housing price predictions by accounting for intricate interactions between
variables .

Moreover, integrating spatio-temporal analysis can enhance the model's predictive capabilities
by considering how property values change over time and across different locations .
Including additional features like neighborhood crime rates, school quality, and accessibility
to public services can also provide a more comprehensive understanding of the factors
influencing housing prices.

In summary, while our linear regression model serves as a solid foundation, embracing more
sophisticated techniques and a broader set of features can lead to more accurate and insightful
predictions in the dynamic real estate market.

15
Chapter 7: Conclusion
This project has successfully demonstrated the deployment of a linear regression model to
predict housing prices based on various attributes. A critical success factor in enhancing data
quality and, consequently, the model's performance was the meticulous preprocessing of data.
Through this process, essential determinants of real estate prices—particularly the property's
location and size—were identified as significant contributors to price variations.

To further refine the model's predictive accuracy and gain deeper insights into housing market
trends, future work could involve the incorporation of advanced regression techniques, such
as Lasso, Ridge, or Elastic Net, and the inclusion of additional features like economic
indicators or neighborhood amenities. Expanding the dataset to encompass a broader range of
properties and market conditions would also enhance the model's robustness. Moreover,
implementing time-series analysis could account for market fluctuations over time, thereby
improving the model's ability to predict future housing prices more accurately.

In addition to these enhancements, integrating spatio-temporal analysis can provide a more

nuanced understanding of how location-based factors and temporal trends influence housing
prices. By considering the spatial distribution of properties and temporal market dynamics,
the model can capture complex patterns that traditional regression techniques might overlook.
This approach aligns with recent research emphasizing the importance of incorporating
spatio-temporal dependencies in housing price prediction models .ScienceDirect

Furthermore, leveraging ensemble learning methods, such as Random Forests or Gradient

Boosting Machines, can improve predictive performance by combining the strengths of
multiple algorithms. These methods are adept at handling non-linear relationships and
interactions between variables, which are common in real-world housing data. Studies have
shown that ensemble models often outperform single-model approaches in terms of accuracy
and robustness .

16
Chapter 8: References
[1] R. Panchal, A. S. Pandit, and S. K. Malakar, "Data preprocessing for efficient housing price
prediction using machine learning," International Journal of Computational Intelligence Research,
vol. 13, no. 5, pp. 1135–1144, 2022.
[2] S. Park and J. Bae, "Enhancing prediction accuracy in real estate using data preprocessing and
machine learning," IEEE Access, vol. 8, pp. 32164–32174, 2020.
[3] P. Pardeshi and R. Jain, "Dimensionality reduction and data transformation for optimized
machine learning in real estate predictions," Journal of Data Science, vol. 14, no. 2, pp. 89–102,
2021.
[4] X. Zhang, H. Li, and D. Zhang, "Comparative analysis of machine learning algorithms for
house price prediction," Proceedings of the IEEE International Conference on Big Data, pp. 1172–
1180, 2019.
[5] G. Gopika, R. S. Aruna, and A. M. Jayakumar, "A framework for accurate housing price
prediction using cross-validation techniques," IEEE Transactions on Artificial Intelligence, vol. 6,
no. 3, pp. 122–130, 2022.
[6] N. Reddy, S. Rao, and A. K. Naik, "Improving real estate valuation through grid search
hyperparameter tuning," ACM Transactions on Machine Learning and Optimization, vol. 15, no.
1, pp. 112–121, 2023

Bengaluru House Price Prediction Report
No ratings yet
Bengaluru House Price Prediction Report
18 pages
KIIT Deemed To Be University: A Project Report
No ratings yet
KIIT Deemed To Be University: A Project Report
33 pages
Arvind Report
No ratings yet
Arvind Report
21 pages
ml project clg (2)
No ratings yet
ml project clg (2)
62 pages
mnm1
No ratings yet
mnm1
17 pages
Report
No ratings yet
Report
17 pages
Final Defence
No ratings yet
Final Defence
55 pages
Rev Ajrcos 101262 Ina A
No ratings yet
Rev Ajrcos 101262 Ina A
11 pages
House Price Prediction - Research Paper FINAL DRAFT
100% (1)
House Price Prediction - Research Paper FINAL DRAFT
10 pages
Property Price Prediction
No ratings yet
Property Price Prediction
16 pages
Vasanth Sample 2
No ratings yet
Vasanth Sample 2
30 pages
Bi El
No ratings yet
Bi El
26 pages
Real Estate Price Prediction Model
No ratings yet
Real Estate Price Prediction Model
3 pages
House Price Predicting Model using
No ratings yet
House Price Predicting Model using
7 pages
Housepriceprediction ML 221104055342 Fb5109ae
No ratings yet
Housepriceprediction ML 221104055342 Fb5109ae
17 pages
House Price Prediction using AI
No ratings yet
House Price Prediction using AI
14 pages
ES205 Researchpaper
No ratings yet
ES205 Researchpaper
17 pages
House_Price_Prediction_using_AI[1]
No ratings yet
House_Price_Prediction_using_AI[1]
12 pages
Mini Project Ppt Sample Copy AIML[1]
No ratings yet
Mini Project Ppt Sample Copy AIML[1]
16 pages
Khare 2021 IOP Conf. Ser. Mater. Sci. Eng. 1099 012053
No ratings yet
Khare 2021 IOP Conf. Ser. Mater. Sci. Eng. 1099 012053
15 pages
Price Prediction
No ratings yet
Price Prediction
16 pages
Sample Synopsis
No ratings yet
Sample Synopsis
4 pages
Abstract Machine Learning Has Been Instrumental Across Diver
No ratings yet
Abstract Machine Learning Has Been Instrumental Across Diver
6 pages
Oral Presentation
No ratings yet
Oral Presentation
9 pages
Real Estate Price Prediction
No ratings yet
Real Estate Price Prediction
7 pages
Comprehensive Project
No ratings yet
Comprehensive Project
10 pages
R D National College Mumbai University: On "House Price Prediction System"
No ratings yet
R D National College Mumbai University: On "House Price Prediction System"
14 pages
HOUSE PRICE PREDICTION
No ratings yet
HOUSE PRICE PREDICTION
17 pages
House Price Prediction Using Machine Learning
No ratings yet
House Price Prediction Using Machine Learning
6 pages
intership report
No ratings yet
intership report
20 pages
Minor Project Report
No ratings yet
Minor Project Report
23 pages
House Price Prediction Using Machine Learning: © MAY 2021 - IRE Journals - Volume 4 Issue 11 - ISSN: 2456-8880
No ratings yet
House Price Prediction Using Machine Learning: © MAY 2021 - IRE Journals - Volume 4 Issue 11 - ISSN: 2456-8880
5 pages
Bangalore House Price Prediction
No ratings yet
Bangalore House Price Prediction
5 pages
Review paper of house rate prediction
No ratings yet
Review paper of house rate prediction
7 pages
Artificial Intelligence Approach For Modeling House Price Prediction
No ratings yet
Artificial Intelligence Approach For Modeling House Price Prediction
5 pages
Benkelman Beam Test PDF
No ratings yet
Benkelman Beam Test PDF
10 pages
A14 Abstract
No ratings yet
A14 Abstract
2 pages
Comparative Study of House Price Prediction Using Machine Learning Research Paper
No ratings yet
Comparative Study of House Price Prediction Using Machine Learning Research Paper
14 pages
Real-Estate Property
No ratings yet
Real-Estate Property
11 pages
Utkarsh Gupta G (73) (House Price Prediction)
No ratings yet
Utkarsh Gupta G (73) (House Price Prediction)
6 pages
Utkarsh Gupta - House Price Prediction
No ratings yet
Utkarsh Gupta - House Price Prediction
6 pages
BDA_REPORT
No ratings yet
BDA_REPORT
27 pages
House Price Prediction 3 47
No ratings yet
House Price Prediction 3 47
45 pages
Housing Price Prediction Model Using Machine Learning
No ratings yet
Housing Price Prediction Model Using Machine Learning
4 pages
SSRN Id4413863
No ratings yet
SSRN Id4413863
5 pages
Bangalore House Price Prediction
No ratings yet
Bangalore House Price Prediction
5 pages
Rep Project Journal
No ratings yet
Rep Project Journal
10 pages
House Price Prediction With Analysis
No ratings yet
House Price Prediction With Analysis
9 pages
Survey Paper Updated
No ratings yet
Survey Paper Updated
12 pages
Real Estate Price Prediction Using Machine Learning
No ratings yet
Real Estate Price Prediction Using Machine Learning
7 pages
IJIRCT2203007
No ratings yet
IJIRCT2203007
4 pages
House Price Prediction Using Machine Learning
No ratings yet
House Price Prediction Using Machine Learning
6 pages
CSIC 6132 排版870 878
No ratings yet
CSIC 6132 排版870 878
9 pages
Introduction To Geostatistics: Andr As B Ardossy Institute of Hydraulic Engineering University of Stuttgart
100% (3)
Introduction To Geostatistics: Andr As B Ardossy Institute of Hydraulic Engineering University of Stuttgart
134 pages
Fyp Proposal
No ratings yet
Fyp Proposal
3 pages
Topic - Mini Research Project (CIA 4)
No ratings yet
Topic - Mini Research Project (CIA 4)
4 pages
Machine Learning Based Predicting House Prices Using Regression Techniques
No ratings yet
Machine Learning Based Predicting House Prices Using Regression Techniques
7 pages
Data Science Assignment Chapter 1
No ratings yet
Data Science Assignment Chapter 1
5 pages
Housepricepdf 2
No ratings yet
Housepricepdf 2
3 pages
1.8.4 Test (TST) - Statistical Analysis (Test)
No ratings yet
1.8.4 Test (TST) - Statistical Analysis (Test)
12 pages
Data Analytics For Beginners - Paul Kinley - CreateSpace Independent Publishing Platform 2016 - IsBN 978-1-53989-673-9
100% (2)
Data Analytics For Beginners - Paul Kinley - CreateSpace Independent Publishing Platform 2016 - IsBN 978-1-53989-673-9
51 pages
The State of Evidence For Social and Emotional Learning: A Contemporary Meta-Analysis of Universal School-Based SEL Interventions
100% (1)
The State of Evidence For Social and Emotional Learning: A Contemporary Meta-Analysis of Universal School-Based SEL Interventions
113 pages
SSRN Id3565512
No ratings yet
SSRN Id3565512
5 pages
Bangalore House Price Prediction
No ratings yet
Bangalore House Price Prediction
4 pages
Sta2604 2012 - Studyguide - 001 2012 4 B
No ratings yet
Sta2604 2012 - Studyguide - 001 2012 4 B
127 pages
BA Interview Questions and Answers
100% (1)
BA Interview Questions and Answers
40 pages
Logistic Equation Math IA
50% (2)
Logistic Equation Math IA
16 pages
参考文献3
No ratings yet
参考文献3
9 pages
Jolts
No ratings yet
Jolts
37 pages
Microsoft PowerPoint - Introduction To Biostatistics MAK 2023 - Print - Introduction To Biostatistics MAK 2023
No ratings yet
Microsoft PowerPoint - Introduction To Biostatistics MAK 2023 - Print - Introduction To Biostatistics MAK 2023
47 pages
ntXIiKXCDR6JFJWL - Learning To Use Regression Analysis
100% (1)
ntXIiKXCDR6JFJWL - Learning To Use Regression Analysis
26 pages
Exercises 5
No ratings yet
Exercises 5
5 pages
SOURCE: IS:8900-1978 Method 1 For Single Outlier, Assume A Set of Following Observations
No ratings yet
SOURCE: IS:8900-1978 Method 1 For Single Outlier, Assume A Set of Following Observations
11 pages
Data Mining - Outlier Analysis
100% (3)
Data Mining - Outlier Analysis
11 pages
The Hampel Identifier - Robust Outlier Detection in A Time Series
No ratings yet
The Hampel Identifier - Robust Outlier Detection in A Time Series
9 pages
2014 - OJO FORMULA Numerical Predictors of Arithmetic Success in Grades 1-6
No ratings yet
2014 - OJO FORMULA Numerical Predictors of Arithmetic Success in Grades 1-6
13 pages
Competitiveness Scale As A Basis For Brazilian Small and Medium-Sized Enterprises
No ratings yet
Competitiveness Scale As A Basis For Brazilian Small and Medium-Sized Enterprises
19 pages
Temperature Sensor Drift
No ratings yet
Temperature Sensor Drift
17 pages
Prediction of P-Sonic Log in The Volve Oil Field Using Machine Learning by Yohanes Nuwara Towards Data Science
No ratings yet
Prediction of P-Sonic Log in The Volve Oil Field Using Machine Learning by Yohanes Nuwara Towards Data Science
21 pages
What Is Exploratory Data Analysis (EDA) ?
No ratings yet
What Is Exploratory Data Analysis (EDA) ?
6 pages
Outlier Detection For Different Applications Review IJERTV2IS3508
No ratings yet
Outlier Detection For Different Applications Review IJERTV2IS3508
13 pages
Intellihealth
No ratings yet
Intellihealth
16 pages
SEM Research Assignment
100% (1)
SEM Research Assignment
9 pages
A-& B-Basis Analysis: User Manual
No ratings yet
A-& B-Basis Analysis: User Manual
10 pages
Unit Costs of Infrastructure Projects in Sub-Saharan Africa
No ratings yet
Unit Costs of Infrastructure Projects in Sub-Saharan Africa
10 pages
Farlin Bnad276-003 Completed Analytics Report
No ratings yet
Farlin Bnad276-003 Completed Analytics Report
6 pages
Training Materials - Adjustments Advanced Network Adjustment
No ratings yet
Training Materials - Adjustments Advanced Network Adjustment
19 pages
Running Your Own Proficiency Test: R.R. Cook
No ratings yet
Running Your Own Proficiency Test: R.R. Cook
7 pages
Notes: - If Eqaution Ha No Soulutions Means Lines Are Parrallel
No ratings yet
Notes: - If Eqaution Ha No Soulutions Means Lines Are Parrallel
4 pages
Value Engineering Techniques and Applications: Definitive Reference for Developers and Engineers
From Everand
Value Engineering Techniques and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

ML Mini Project HousePricePrediction

Uploaded by

ML Mini Project HousePricePrediction

Uploaded by

A Mini Project on Residential Property Price

Submitted in partial fulfilment of the requirement for the award of degree of

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Model building is also accompanied by ML algorithms. Zhang et al. compared several

1) Data Collection and Import

2) Initial Data Exploration

3) Data Cleaning and Dropping Unnecessary Columns

5) Creating a Price Per Square Foot Column

6) Location Simplification (Dimensionality Reduction)

9) Exploratory Data Analysis (EDA)

10) Outlier Removal Based on Bathroom Feature

11) Encoding Categorical Variables

12) Building the Model

13) Cross-Validation and Model Tuning

14) Prediction Function for New Data

1) Importing Required Libraries

2) Loading and Pre-processing the Dataset

df2['bhk'] = df2['size'].apply(lambda x: int(x.split(' ')[0]))

df3 = df2[~(df2.total_sqft / df2.bhk < 300)]

5) Encoding Categorical Variables

from sklearn.model_selection import train_test_split

7) Model Evaluation and Tuning

from sklearn.model_selection import GridSearchCV

def find_best_model_using_gridsearchcv(X, y):

Chapter 5: Analysis & Result

The findings of the analysis were as follows:

Figure 2: Scatter chart to visualize price per sqft (Hebbal)

Figure 4: Distribution of Number of Bathrooms in Properties

However, linear regression, while straightforward and interpretable, assumes a linear

To address these limitations, incorporating advanced machine learning models such as

In addition to these enhancements, integrating spatio-temporal analysis can provide a more

Furthermore, leveraging ensemble learning methods, such as Random Forests or Gradient

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.