0% found this document useful (0 votes)

61 views

Data Mining

The document discusses predicting car prices using machine learning. It describes preprocessing a car dataset by handling missing values, encoding categorical variables, and scaling numerical features. Models like linear regression and decision trees are trained on preprocessed data to predict prices, and their performance is evaluated on test data using metrics like MAE, MSE, and R2.

Uploaded by

SY ECE51 SHEJUL YUVRAJ

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views

Data Mining

Uploaded by

SY ECE51 SHEJUL YUVRAJ

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Sanjivani College of Engineering

Kopargaon - 423 603.

(Savitribai Phule Pune University, Pune)

Academic Year 2023-24

CIA ACTIVITY
MINI PROJECT
REPORT

Car Price Prediction

Submitted By

Yuvraj Shejul (50)

Mahesh Salpure (49)
Chetan Devre (14)
Pranav Autade (04)

(B. Tech)

Guide:

Prof. S.S.Kulkarni
1) Problem Statement

The car features and price prediction analysis aim to leverage machine learning techniques to predict
car prices based on various features. The objectives include exploring the dataset, preprocessing the
data, building predictive models, and evaluating their performance.

2) Solution

To implement the hotel reservation dataset analysis, follow these general steps.
1) Data Preparation: Acquire a comprehensive hotel reservation dataset with relevant
features.
2) Data Cleaning: Address missing values, outliers, and inconsistencies in the dataset to
ensure the quality of the analysis.
3) Exploratory Data Analysis: Conduct exploratory data analysis to understand the
distribution of key variables, identify patterns, and gain insights into the characteristics of
successful reservations and cancellations.

3) Dataset Structure:

The dataset is organized into rows, each representing a unique car entry, and columns representing
different attributes associated with the cars. The dataset contains the following columns:

ID: Unique identifier for each car entry.

Price: The target column, representing the price of the car.
Levy: Levy or tax applied to the car's price.
Manufacturer: The brand or manufacturer of the car.
Model: The specific model name or number of the car.
Prod. Year: The year in which the car was manufactured.
Category: The category to which the car belongs (e.g., sedan, SUV, hatchback).
Leather Interior: Whether the car features a leather interior (Yes/No).
Fuel Type: The type of fuel the car uses (e.g., gasoline, diesel).
Engine Volume: The volume of the car's engine in cubic centimeters (cc).
Mileage: The total distance the car has traveled in kilometers.
Cylinders: The number of cylinders in the car's engine.
Gear Box Type: The type of gear box in the car (e.g., automatic, manual).
Drive Wheels: The type of wheels the car uses for driving (e.g., front-wheel drive, all-wheel drive).
Doors: The number of doors on the car.
Wheel: The type of wheel the car has (e.g., left wheel, right wheel).
Color: The color of the car's exterior.
Airbags: The number of airbags installed in the car for safety.

4) Important Libraries:

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv("car_price_prediction.csv")
5) Understanding The Dataset:

df.describe().T
df.shape
df.info()
df.duplicated().sum()
df.isnull().sum()

6) Data Preprocessing:

Preprocessing Steps

The preprocessing steps for the given hotel reservation dataset involve handling
missing values, encoding categorical variables, and scaling numericalfeatures. Here's a
suggested set of preprocessing steps.
1) Handling Missing Values:

Check for any missing values in the dataset and decide on an appropriate
strategy for handling them. This might involve imputation, removal of rows or columns
with missing values, or other techniques depending on the nature of themissing data.

2) Encoding Categorical Variables:

Convert categorical variables, such as type_of_meal_plan, room_type_reserved,

and market_segment_type, into numerical representations. This can be achieved through
one-hot encoding or label encoding. One-hot encoding creates binary columns for each
category, while label encoding assigns aunique numerical label to each category.

3) Dropping Unnecessary Columns:

Remove any columns that are not relevant to the analysis or do notcontribute
meaningful information.
After performing these preprocessing steps, the dataset will be more suitable for
training machine learning models . The choice of specific preprocessing steps may vary
based on the characteristics of the data and the goalsof the analysis.
Code:

1. Remove Duplicates:

df = df.drop_duplicates()

2. Clean Values:

# Clean column names

df.columns = df.columns.str.strip()

# Remove 'km' from mileage

df["Mileage"] = df["Mileage"].str.replace(' km', '')

# Handle extra characters in Levy

df["Levy"] = df["Levy"].str.replace("-", '')
df["Levy"] = df["Levy"].str.replace("", '0')
df["Levy"] = df["Levy"].astype(str).astype(float)

# Extract 'With Turbo' information

df['With Turbo'] = df['Engine volume'].str.contains(' Turbo', case=False).astype(int)
df["Engine volume"] = df["Engine volume"].str.replace(' Turbo', '')
df["Engine volume"] = df["Engine volume"].astype(str).astype(float)

3. Handling Missing Values:

No missing values in all features.

4. Handling Data Types:

# Convert categorical columns to 'category' type

df['Manufacturer'] = df['Manufacturer'].astype('category')
df['Category'] = df['Category'].astype('category')
df['Leather interior'] = df['Leather interior'].astype('category')
df['Fuel type'] = df['Fuel type'].astype('category')
df['Gear box type'] = df['Gear box type'].astype('category')
df['Drive wheels'] = df['Drive wheels'].astype('category')
df['Doors'] = df['Doors'].astype('category')
df['Wheel'] = df['Wheel'].astype('category')
df['Color'] = df['Color'].astype('category')
df['Model'] = df['Model'].astype('category')

5. Scaling:

from sklearn.preprocessing import MinMaxScaler

scale = MinMaxScaler()
X = df["Mileage"].values.reshape(-1, 1)
scaledX = scale.fit_transform(X)
df["Mileage"] = scaledX

X2 = df["Levy"].values.reshape(-1, 1)
scaledX2 = scale.fit_transform(X2)
df["Levy"] = scaledX2
6. Encoding:

# Label encoding
# Replace values in some columns
df['Doors'].replace(['04-May', '02-Mar', ">5"], [1, 2, 3], inplace=True)
df['Wheel'].replace(['Left wheel', 'Right-hand drive'], [1, 2], inplace=True)
df['Drive wheels'].replace(['Front', '4x4', "Rear"], [1, 2, 3], inplace=True)
df['Gear box type'].replace(['Automatic', 'Tiptronic', "Manual", "Variator"], [1, 2, 3, 4], inplace=True)
df['Leather interior'].replace(['Yes', 'No'], [1, 2], inplace=True)

# One-hot encoding for categorical columns

df_encoded = pd.get_dummies(df, columns=["Manufacturer", "Category", "Color", "Fuel type",
"Model"])
# Plot a bar plot of average Price by Manufacturer
plt.figure(figsize=(12, 6))
sorted_manufacturer_avg_price.plot(kind='bar', color='skyblue')
plt.xlabel('Manufacturer')
plt.ylabel('Average Price')
plt.title('Average Car Price by Manufacturer')
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()
Exploratory Data Analysis (EDA):

# Correlation Heatmap

A heatmap to visualize the correlation between different numerical features in thedataset. Bar
Plots for Categorical Variables:
Code:

plt.figure(figsize=(15, 10))
cor = df.corr()
sns.heatmap(cor, annot=True, cmap=plt.cm.Reds)
plt.show()

## 5.1 Price Distribution by Category

plt.figure(figsize=(14, 8))
sns.boxplot(x='Category', y='Price', data=df, palette='viridis')
plt.title('Price Distribution by Category')
plt.xlabel('Category')
plt.ylabel('Price')
plt.show()
## 5.2 Manufacturer vs. Price
plt.figure(figsize=(18, 8))
sns.barplot(x='Manufacturer', y='Price', data=df, palette='coolwarm')
plt.title('Manufacturer vs. Price')
plt.xlabel('Manufacturer')
plt.ylabel('Price')
plt.xticks(rotation=45, ha='right')
plt.show()

# 5.5 Pairplot for Numerical Features ('Price' and 'Levy')

numerical_columns_pairplot = ['Price', 'Levy']

plt.figure(figsize=(12, 8))
sns.pairplot(df[numerical_columns_pairplot])
plt.title('Pairplot for Numerical Features (Price and Levy)')
plt.show()

Modeling:

Building a machine learning model involves several key steps. Since you've already preprocessed
your data and selected a few algorithms (Decision Tree, Random Forest, and SVM), let's outline the
general process for building a model.

• Train-Test Split:

Split your dataset into training and testing sets. The training set is used to train the model, while the
testing set is used to evaluate its performance on unseen data.

• Feature Scaling:

Depending on the algorithms you're using, it might be necessary to scale your features. Some
algorithms, like SVM, are sensitive to the scale of input features.

• Model Training:

Train your selected algorithms on the training data.

• Model Evaluation:

Evaluate the performance of your models using metrics relevant to your problem.

Code:

from sklearn.model_selection import train_test_split

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

Linear Regression:

from sklearn.linear_model import LinearRegression

linearregression = LinearRegression()
linearregression.fit(X_train, y_train)

y_pred1 = linearregression.predict(X_test)
LinearRegression_score = linearregression.score(X_test, y_test)

Decision Tree Regressor:

from sklearn.tree import DecisionTreeRegressor

DecisiontreeRegressor = DecisionTreeRegressor()
DecisiontreeRegressor.fit(X_train, y_train)

y_pred2 = DecisiontreeRegressor.predict(X_test)
DecisionTreeRegressor_score = DecisiontreeRegressor.score(X_test, y_test)

Model Evaluation:

Linear Regression:

print("LinearRegression score:", LinearRegression_score)

# Calculate evaluation metrics
MAE.append(mean_absolute_error(y_test, y_pred1))
MSE.append(mean_squared_error(y_test, y_pred1))
RMSE.append(np.sqrt(MSE[-1]))
R2.append(r2_score(y_test, y_pred1))

# Print the evaluation metrics

print("\nLinearRegression model evaluation")
print(f'Mean Absolute Error (MAE): {MAE[-1]:.2f}')
print(f'Mean Squared Error (MSE): {MSE[-1]:.2f}')
print(f'Root Mean Squared Error (RMSE): {RMSE[-1]:.2f}')
print(f'R-squared (R²): {R2[-1]:.2f}\n')

# Result DataFrame
result1 = pd.DataFrame()
result1["y_test"] = y_test
result1["y_predicted"] = y_pred1

Decision Tree Regressor:

print("DecisionTreeRegressor score:", DecisionTreeRegressor_score)

# Calculate evaluation metrics
MAE.append(mean_absolute_error(y_test, y_pred2))
MSE.append(mean_squared_error(y_test, y_pred2))
RMSE.append(np.sqrt(MSE[-1]))
R2.append(r2_score(y_test, y_pred2))

# Print the evaluation metrics

print("\nDecisionTreeRegressor model evaluation")
print(f'Mean Absolute Error (MAE): {MAE[-1]:.2f}')
print(f'Mean Squared Error (MSE): {MSE[-1]:.2f}')
print(f'Root Mean Squared Error (RMSE): {RMSE[-1]:.2f}')
print(f'R-squared (R²): {R2[-1]:.2f}\n')

# Result DataFrame
result2 = pd.DataFrame()
result2["y_test"] = y_test
result2["y_predicted"] = y_pred

RESULT:

DecisionTreeRegressor score: 0.00019264796448403843

DecisionTreeRegressor model evaluation
Mean Absolute Error (MAE): 10226.77
Mean Squared Error (MSE): 140270221278.28
Root Mean Squared Error (RMSE): 374526.66
R-squared (R²): 0.00

Conclusion:

The project successfully explored, preprocessed, and modeled the car features and price dataset.
Both Linear Regression and Decision Tree Regressor models were implemented and evaluated.
Further analysis and optimization could be performed to enhance model performance.

Report
No ratings yet
Report
4 pages
Note
No ratings yet
Note
9 pages
Internship
No ratings yet
Internship
23 pages
Learning/"
No ratings yet
Learning/"
32 pages
Car Price Prediction Using Machine Learning
33% (3)
Car Price Prediction Using Machine Learning
15 pages
Data Analysis Report
No ratings yet
Data Analysis Report
74 pages
Capstone Project
No ratings yet
Capstone Project
24 pages
Car Price Prediction
No ratings yet
Car Price Prediction
18 pages
DOC-20241021-WA0014.
No ratings yet
DOC-20241021-WA0014.
3 pages
car-price-prediction-1 (1)
No ratings yet
car-price-prediction-1 (1)
24 pages
Car Price Prediction
No ratings yet
Car Price Prediction
35 pages
Laptop Price Analysis
No ratings yet
Laptop Price Analysis
37 pages
Machine Learning With Python - Part-2
No ratings yet
Machine Learning With Python - Part-2
27 pages
INN_Hotels_Project.docx
No ratings yet
INN_Hotels_Project.docx
26 pages
Car Price Prediction Project
No ratings yet
Car Price Prediction Project
34 pages
Sales Car Price Predictions
No ratings yet
Sales Car Price Predictions
6 pages
Ip Project
No ratings yet
Ip Project
52 pages
Laptop Price Analysis (Finance Analyst)
No ratings yet
Laptop Price Analysis (Finance Analyst)
36 pages
Linear Regression
100% (1)
Linear Regression
16 pages
SVM (Support Vector Machine) For Classification - by Aditya Kumar - Towards Data Science
100% (1)
SVM (Support Vector Machine) For Classification - by Aditya Kumar - Towards Data Science
28 pages
Grafik
No ratings yet
Grafik
4 pages
project_documentation
No ratings yet
project_documentation
1 page
DS_on_MTCARS_Solutions
No ratings yet
DS_on_MTCARS_Solutions
3 pages
Car Price Prediction
No ratings yet
Car Price Prediction
8 pages
Car Price Detection Based On The Travelling Distance
No ratings yet
Car Price Detection Based On The Travelling Distance
15 pages
car-price
No ratings yet
car-price
6 pages
Lab 10 Ai Mussab(Fa22 Bce 073)
No ratings yet
Lab 10 Ai Mussab(Fa22 Bce 073)
7 pages
Car prediction - Colab
No ratings yet
Car prediction - Colab
8 pages
Data Clearning
No ratings yet
Data Clearning
7 pages
Exercise5 Solution
No ratings yet
Exercise5 Solution
22 pages
Ads Lab Manual
No ratings yet
Ads Lab Manual
63 pages
Data Analytics Project PDF
No ratings yet
Data Analytics Project PDF
10 pages
Car_Dekho-Used_Car_Price_Prediction
No ratings yet
Car_Dekho-Used_Car_Price_Prediction
10 pages
Finalised FBA CIA 3
No ratings yet
Finalised FBA CIA 3
16 pages
Project Soft
No ratings yet
Project Soft
28 pages
Task 3 Car Price Prediction Using Machine Learning
No ratings yet
Task 3 Car Price Prediction Using Machine Learning
30 pages
Data Frames and Charts 2: 2.1 Dealing With Missing Values
No ratings yet
Data Frames and Charts 2: 2.1 Dealing With Missing Values
12 pages
Eda 1
No ratings yet
Eda 1
29 pages
University of Mauritius: Assignment On Supervised & Unsupervised Machine Learning Algorithms
No ratings yet
University of Mauritius: Assignment On Supervised & Unsupervised Machine Learning Algorithms
71 pages
Registro da analise de dataset de laptops
No ratings yet
Registro da analise de dataset de laptops
1 page
Ajay and Saurabh
No ratings yet
Ajay and Saurabh
16 pages
content beyond syllabus and case based program
No ratings yet
content beyond syllabus and case based program
8 pages
Laptop Price Prediction
No ratings yet
Laptop Price Prediction
15 pages
Articles Xgboost Classification With Smote-Enn Algorithm
No ratings yet
Articles Xgboost Classification With Smote-Enn Algorithm
11 pages
Bulldozer Price Prediction Using Regression Model (Research Ethics)
No ratings yet
Bulldozer Price Prediction Using Regression Model (Research Ethics)
19 pages
DSPY Lab Project (Formatted) 2
No ratings yet
DSPY Lab Project (Formatted) 2
14 pages
Exp_5_Exploratory_Data_Analysis_sdk_ok
No ratings yet
Exp_5_Exploratory_Data_Analysis_sdk_ok
13 pages
Xii Project PDF
No ratings yet
Xii Project PDF
19 pages
Car Price Prediction
No ratings yet
Car Price Prediction
21 pages
Engo 645
No ratings yet
Engo 645
10 pages
Machine Learning Lab Record Report
No ratings yet
Machine Learning Lab Record Report
38 pages
Weekly Diary Report-244
No ratings yet
Weekly Diary Report-244
9 pages
Lab 1. Boston House
No ratings yet
Lab 1. Boston House
7 pages
#1 - Skill Builds - Data Analysis With Python
No ratings yet
#1 - Skill Builds - Data Analysis With Python
3 pages
Used Car Pricing Engine Development India
No ratings yet
Used Car Pricing Engine Development India
2 pages
Automobile Sales Predictions
No ratings yet
Automobile Sales Predictions
19 pages
Team AN
No ratings yet
Team AN
23 pages
uml (1)
No ratings yet
uml (1)
11 pages
AI-900: Microsoft Azure AI Fundamentals Preparation
From Everand
AI-900: Microsoft Azure AI Fundamentals Preparation
Georgio Daccache
No ratings yet
AutoCAD 2019: A Problem - Solving Approach, Basic and Intermediate, 25th Edition
From Everand
AutoCAD 2019: A Problem - Solving Approach, Basic and Intermediate, 25th Edition
Prof. Sham Tickoo
No ratings yet
Fir Filter Design: Topic
No ratings yet
Fir Filter Design: Topic
15 pages
Tutorial 3
No ratings yet
Tutorial 3
21 pages
Mini Project Cia
No ratings yet
Mini Project Cia
11 pages
Program 06
No ratings yet
Program 06
2 pages
Unit 5
No ratings yet
Unit 5
10 pages
Program 03
No ratings yet
Program 03
3 pages
E-Business - Unit 11 - Week 9
100% (1)
E-Business - Unit 11 - Week 9
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Data Mining

Uploaded by

Data Mining

Uploaded by

Sanjivani College of Engineering

Kopargaon - 423 603.

(Savitribai Phule Pune University, Pune)

Academic Year 2023-24

Car Price Prediction

Yuvraj Shejul (50)

ID: Unique identifier for each car entry.

2) Encoding Categorical Variables:

Convert categorical variables, such as type_of_meal_plan, room_type_reserved,

3) Dropping Unnecessary Columns:

# Clean column names

# Remove 'km' from mileage

# Handle extra characters in Levy

# Extract 'With Turbo' information

3. Handling Missing Values:

No missing values in all features.

4. Handling Data Types:

# Convert categorical columns to 'category' type

from sklearn.preprocessing import MinMaxScaler

# One-hot encoding for categorical columns

## 5.1 Price Distribution by Category

# 5.5 Pairplot for Numerical Features ('Price' and 'Levy')

Train your selected algorithms on the training data.

from sklearn.model_selection import train_test_split

# Split the dataset into training and testing sets

from sklearn.linear_model import LinearRegression

Decision Tree Regressor:

from sklearn.tree import DecisionTreeRegressor

print("LinearRegression score:", LinearRegression_score)

# Print the evaluation metrics

Decision Tree Regressor:

print("DecisionTreeRegressor score:", DecisionTreeRegressor_score)

# Print the evaluation metrics

DecisionTreeRegressor score: 0.00019264796448403843

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.