0% found this document useful (0 votes)

18 views

Neenopal Data Analysis Task 2

Uploaded by

muditchechi03

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

Neenopal Data Analysis Task 2

Uploaded by

muditchechi03

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Neenopal Task

Task 1 - Reading the csv File

In [ ]: pip install sqlalchemy

In [2]: import numpy as np

import pandas as pd
from sqlalchemy import create_engine
import pymysql

In [3]: used_bikes = pd.read_csv(r"E:\Excel\Neenopal\used_bikes.csv")

In [4]: used_bikes.head(30)

Out[4]: bike_name price city kms_driven owner age power brand

0 TVS Star City Plus Dual Tone 110cc 35000.0 Ahmedabad 17654.0 First Owner 3.0 110.0 TVS

Royal
1 Royal Enfield Classic 350cc 119900.0 Delhi 11000.0 First Owner 4.0 350.0
Enfield

2 Triumph Daytona 675R 600000.0 Delhi 110.0 First Owner 8.0 675.0 Triumph

3 TVS Apache RTR 180cc 65000.0 Bangalore 16329.0 First Owner 4.0 180.0 TVS

4 Yamaha FZ S V 2.0 150cc-Ltd. Edition 80000.0 Bangalore 10000.0 First Owner 3.0 150.0 Yamaha

5 Yamaha FZs 150cc 53499.0 Delhi 25000.0 First Owner 6.0 150.0 Yamaha

6 Honda CB Hornet 160R ABS DLX 85000.0 Delhi 8200.0 First Owner 3.0 160.0 Honda

7 Hero Splendor Plus Self Alloy 100cc 45000.0 Delhi 12645.0 First Owner 3.0 100.0 Hero

Royal
8 Royal Enfield Thunderbird X 350cc 145000.0 Bangalore 9190.0 First Owner 3.0 350.0
Enfield

Royal Enfield Classic Desert Storm Second Royal

9 88000.0 Delhi 19000.0 7.0 500.0
500cc Owner Enfield

10 Yamaha YZF-R15 2.0 150cc 72000.0 Bangalore 20000.0 First Owner 7.0 150.0 Yamaha

11 Yamaha FZ25 250cc 95000.0 Bangalore 9665.0 First Owner 4.0 250.0 Yamaha

12 Bajaj Pulsar NS200 78000.0 Bangalore 9900.0 First Owner 4.0 200.0 Bajaj

13 Bajaj Discover 100M 29499.0 Delhi 20000.0 First Owner 8.0 100.0 Bajaj

14 Bajaj Discover 125M 29900.0 Delhi 20000.0 First Owner 7.0 125.0 Bajaj

15 Bajaj Pulsar NS200 ABS 90000.0 Bangalore 11574.0 First Owner 3.0 200.0 Bajaj

16 Bajaj Pulsar RS200 ABS 120000.0 Bangalore 23000.0 First Owner 3.0 200.0 Bajaj

17 Suzuki Gixxer SF 150cc 48000.0 Mumbai 24725.0 First Owner 5.0 150.0 Suzuki

Second
18 Benelli 302R 300CC 240000.0 Mumbai 15025.0 3.0 302.0 Benelli
Owner

19 Bajaj Discover 125M 29900.0 Delhi 20000.0 First Owner 7.0 125.0 Bajaj

20 Bajaj Pulsar RS200 ABS 120000.0 Bangalore 23000.0 First Owner 3.0 200.0 Bajaj

21 Suzuki Gixxer SF 150cc 48000.0 Mumbai 24725.0 First Owner 5.0 150.0 Suzuki
22 Hero Splendor iSmart Plus IBS 110cc 46500.0 Delhi 3500.0 First Owner 2.0 110.0 Hero

Royal
23 Royal Enfield Classic Chrome 500cc 121700.0 Kalyan 24520.0 First Owner 5.0 500.0
Enfield

24 Yamaha FZ V 2.0 150cc 45000.0 Delhi 23000.0 First Owner 6.0 150.0 Yamaha

25 Bajaj Pulsar NS200 78000.0 Bangalore 9900.0 First Owner 4.0 200.0 Bajaj

26 Hero Super Splendor 125cc 20000.0 Ahmedabad 29305.0 First Owner 16.0 125.0 Hero

Second
27 Honda CBF Stunner 125cc 20800.0 Faridabad 30500.0 7.0 125.0 Honda
Owner

28 Bajaj Pulsar 150cc 50000.0 Bangalore 19000.0 First Owner 8.0 150.0 Bajaj

29 Honda X-Blade 160CC ABS 81200.0 Mettur 9100.0 First Owner 2.0 160.0 Honda

Memory Usage before changing data types

In [5]: memory_usage_before = used_bikes.memory_usage(deep=True)

In [6]: memory_usage_before
Index 132
Out[6]:
bike_name 11954
price 1192
city 9546
kms_driven 1192
owner 10144
age 1192
power 1192
brand 9517
dtype: int64

Sum of memory usage before changing data types

In [7]: memory_usage_before_sum = used_bikes.memory_usage(deep=True).sum()

In [8]: memory_usage_before_sum ## Sum of memory used by each column

46061
Out[8]:

Task 2 Changing Data Types

First We will check the Initial Data types of each column

In [10]: used_bikes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 149 entries, 0 to 148
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 bike_name 149 non-null object
1 price 149 non-null float64
2 city 149 non-null object
3 kms_driven 149 non-null float64
4 owner 149 non-null object
5 age 149 non-null float64
6 power 149 non-null float64
7 brand 149 non-null object
dtypes: float64(4), object(4)
memory usage: 9.4+ KB

As we can see The price , kms_driven , age , power are of float type. We can change all of these to int data
type as by analysing the csv file we can see that decimals value are not present.

In [11]: used_bikes["price"] = used_bikes["price"].astype(int)

used_bikes["kms_driven"] = used_bikes["kms_driven"].astype(int)
used_bikes["age"] = used_bikes["age"].astype(int)
used_bikes["power"] = used_bikes["power"].astype(int)

In [12]: used_bikes.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 149 entries, 0 to 148
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 bike_name 149 non-null object
1 price 149 non-null int32
2 city 149 non-null object
3 kms_driven 149 non-null int32
4 owner 149 non-null object
5 age 149 non-null int32
6 power 149 non-null int32
7 brand 149 non-null object
dtypes: int32(4), object(4)
memory usage: 7.1+ KB

Data type is changed . Now as we can see the memory usage is decreased from 9.4+ kb to 7.1 + kb . We will
also compare it by each column as well as the sum of memory usage by each column.

In [13]: memory_usage_after = used_bikes.memory_usage(deep=True)

In [14]: memory_usage_after_sum = used_bikes.memory_usage(deep=True).sum()

In [15]: memory_usage_after_sum

43677
Out[15]:

In [16]: memory_usage_after
Index 132
Out[16]:
bike_name 11954
price 596
city 9546
kms_driven 596
owner 10144
age 596
power 596
brand 9517
dtype: int64

Task 3 - Dumping Data into Sql

There are many techniques to eastbalish a connection between python and mysql . I will be using
sqlalchemy connection technique to estbalish the connection.

In [17]: database_connection = create_engine("mysql+pymysql://root:Yaadnhihai_1629@localhost/neen

In [18]: connection = database_connection.connect()

As the connection is established I will add the data into the dataset in sql

In [19]: table_name = used_bikes

In [20]: column_name = "bike_name"

key_length = 50

In [21]: # Delete the existing index

try:
connection.execute(f"DROP INDEX IF EXISTS {column_name}_index ON {table_name}")
except:
pass

In [22]: used_bikes.head()

Out[22]: bike_name price city kms_driven owner age power brand

0 TVS Star City Plus Dual Tone 110cc 35000 Ahmedabad 17654 First Owner 3 110 TVS

1 Royal Enfield Classic 350cc 119900 Delhi 11000 First Owner 4 350 Royal Enfield

2 Triumph Daytona 675R 600000 Delhi 110 First Owner 8 675 Triumph

3 TVS Apache RTR 180cc 65000 Bangalore 16329 First Owner 4 180 TVS

4 Yamaha FZ S V 2.0 150cc-Ltd. Edition 80000 Bangalore 10000 First Owner 3 150 Yamaha

In [23]: used_bikes.to_sql('used_bikes', connection, if_exists='replace', index=False)

149
Out[23]:

Hence The connection is established and data is dumped into sql database. Also existing index is also
deleted

In [24]: try:
connection.execute(f"CREATE INDEX new_index ON used_bikes({column_name}({key_length}
except Exception as e:
print(f"Error creating index: {e}")
finally:
connection.close()

All the tasks are finished . I deleted the index(if existed) at first and then I have created new index and this is
benificial for the daily data dumping. Indexing the dataset is very useful and thus I have used my existing
knowledge on the concept and tried to bring the best out of it.

Keg JobApplication en
No ratings yet
Keg JobApplication en
2 pages
Data Mining
No ratings yet
Data Mining
10 pages
Belarus Car Price Prediction
No ratings yet
Belarus Car Price Prediction
18 pages
Project Plan Simple Electrical Circuit
100% (1)
Project Plan Simple Electrical Circuit
3 pages
Jeep, Dana & Chrysler Differentials: How to Rebuild the 8-1/4, 8-3/4, Dana 44 & 60 & AMC 20
From Everand
Jeep, Dana & Chrysler Differentials: How to Rebuild the 8-1/4, 8-3/4, Dana 44 & 60 & AMC 20
Larry Shepard
No ratings yet
3 Using Farm Tools and Equipment (UFTE)
100% (5)
3 Using Farm Tools and Equipment (UFTE)
64 pages
elite-sports-cars-eda
No ratings yet
elite-sports-cars-eda
9 pages
BDA-4 EDA Project
No ratings yet
BDA-4 EDA Project
19 pages
3.Exp-3.docx
No ratings yet
3.Exp-3.docx
3 pages
Untitled 0
No ratings yet
Untitled 0
3 pages
Pandas Notes Basic To Advance
No ratings yet
Pandas Notes Basic To Advance
21 pages
Import As Import As: Numpy NP Pandas PD
No ratings yet
Import As Import As: Numpy NP Pandas PD
22 pages
car-price
No ratings yet
car-price
6 pages
DMPA RECORD-3-checkpoint - Removed
No ratings yet
DMPA RECORD-3-checkpoint - Removed
19 pages
Quikr Car Price Prediction Using Linear Regression 1717999953
No ratings yet
Quikr Car Price Prediction Using Linear Regression 1717999953
12 pages
Pandas 1
No ratings yet
Pandas 1
32 pages
Data Analysis
No ratings yet
Data Analysis
58 pages
Car Price Prediction
No ratings yet
Car Price Prediction
35 pages
datacleaning.ipynb - Colab
No ratings yet
datacleaning.ipynb - Colab
4 pages
nalysis-manipulation-and-cleaning
No ratings yet
nalysis-manipulation-and-cleaning
15 pages
Lab Assignment 6
No ratings yet
Lab Assignment 6
5 pages
Advance EDA & Predictive Analytics
No ratings yet
Advance EDA & Predictive Analytics
38 pages
Mtcars - Ipynb - Colab
No ratings yet
Mtcars - Ipynb - Colab
2 pages
GmPrac1 - Jupyter Notebook
No ratings yet
GmPrac1 - Jupyter Notebook
11 pages
Exp_5_Exploratory_Data_Analysis_sdk_ok
No ratings yet
Exp_5_Exploratory_Data_Analysis_sdk_ok
13 pages
DV ca-1
No ratings yet
DV ca-1
9 pages
EDA Withoutcode (1)
No ratings yet
EDA Withoutcode (1)
36 pages
22eg107a11 DWV
No ratings yet
22eg107a11 DWV
15 pages
CODING & OUTPUT Bike Data Analysis
No ratings yet
CODING & OUTPUT Bike Data Analysis
25 pages
car-price-prediction-1 (1)
No ratings yet
car-price-prediction-1 (1)
24 pages
Practical Example Full Notes
No ratings yet
Practical Example Full Notes
48 pages
vertopal.com_Numpy,,Pandas(24.4.25)
No ratings yet
vertopal.com_Numpy,,Pandas(24.4.25)
1 page
Course2 - DataAnalysis With Python - Week3 - Exploratory Data Analysis
No ratings yet
Course2 - DataAnalysis With Python - Week3 - Exploratory Data Analysis
23 pages
Auto Dataset MK - Part 1: Pandas PD Numpy NP
No ratings yet
Auto Dataset MK - Part 1: Pandas PD Numpy NP
18 pages
IP project model
No ratings yet
IP project model
51 pages
Internship
No ratings yet
Internship
23 pages
Project 8 Predictive Analytics - Ipynb - Colaboratory
No ratings yet
Project 8 Predictive Analytics - Ipynb - Colaboratory
8 pages
Used Cars Price Prediction
No ratings yet
Used Cars Price Prediction
17 pages
Submitted By:-Shaikshahanaafroz - Cms20Mba093: 1. Identify The Shape of The Data
No ratings yet
Submitted By:-Shaikshahanaafroz - Cms20Mba093: 1. Identify The Shape of The Data
6 pages
Eda 1
No ratings yet
Eda 1
29 pages
Untitled 21
No ratings yet
Untitled 21
6 pages
Mohy - Jupyter Notebook
No ratings yet
Mohy - Jupyter Notebook
3 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
22 pages
9.Libraries
No ratings yet
9.Libraries
1 page
ML_Lab_2024-26_Final
No ratings yet
ML_Lab_2024-26_Final
46 pages
SMDM-Business Report
No ratings yet
SMDM-Business Report
11 pages
SMDM-Business Report
No ratings yet
SMDM-Business Report
11 pages
Ip Project 2ND Year
No ratings yet
Ip Project 2ND Year
18 pages
Car Price Prediction Using ML
No ratings yet
Car Price Prediction Using ML
11 pages
SMDM Business+Report
No ratings yet
SMDM Business+Report
11 pages
SMDM Business+Report
No ratings yet
SMDM Business+Report
11 pages
SMDM-Business Report
No ratings yet
SMDM-Business Report
11 pages
content beyond syllabus and case based program
No ratings yet
content beyond syllabus and case based program
8 pages
Assignment - CarsData - Descriptive - EDA - Munjal - Exercise - Ipynb - Colaboratory
No ratings yet
Assignment - CarsData - Descriptive - EDA - Munjal - Exercise - Ipynb - Colaboratory
6 pages
INTRODUCTION sshss
No ratings yet
INTRODUCTION sshss
10 pages
Assignment
No ratings yet
Assignment
49 pages
#1 - Skill Builds - Data Analysis With Python
No ratings yet
#1 - Skill Builds - Data Analysis With Python
3 pages
Laptop Price Prediction
No ratings yet
Laptop Price Prediction
15 pages
EDA Assignment
No ratings yet
EDA Assignment
16 pages
Python Codes
No ratings yet
Python Codes
17 pages
Honda CBR600 Hurricane: 599cc. 1987-2010
From Everand
Honda CBR600 Hurricane: 599cc. 1987-2010
Peter Henshaw
No ratings yet
Kawasaki Superbikes: Z1000 R and Z1100 R
From Everand
Kawasaki Superbikes: Z1000 R and Z1100 R
Stefan R. Oehl
No ratings yet
Kawasaki Superbikes: Z1000 D & S
From Everand
Kawasaki Superbikes: Z1000 D & S
Stefan R. Oehl
No ratings yet
DISS - Module 5 - Major Perfomance Task
No ratings yet
DISS - Module 5 - Major Perfomance Task
3 pages
Fosroc Nitoproof 810 TDS
No ratings yet
Fosroc Nitoproof 810 TDS
3 pages
TBT Air Compressor Safety English
No ratings yet
TBT Air Compressor Safety English
2 pages
Dependent Personality Inventory-Revised (DPI-R) - Incorporating A
No ratings yet
Dependent Personality Inventory-Revised (DPI-R) - Incorporating A
85 pages
Object Oriented Programming Using C++
No ratings yet
Object Oriented Programming Using C++
169 pages
Propush - Me Quick Start Guide
No ratings yet
Propush - Me Quick Start Guide
12 pages
Fittings Ferrules Adaptors
No ratings yet
Fittings Ferrules Adaptors
46 pages
Mor Script
No ratings yet
Mor Script
2 pages
Chapter - 1: 1.1 Background of The Study
No ratings yet
Chapter - 1: 1.1 Background of The Study
45 pages
P18020321 #0 Opq Yl - Opq Yl
No ratings yet
P18020321 #0 Opq Yl - Opq Yl
2 pages
Book2 Serial
No ratings yet
Book2 Serial
8 pages
Exam 1 Section 1
100% (1)
Exam 1 Section 1
24 pages
Set Up User Profile in Ariba Guided Buying Tool
No ratings yet
Set Up User Profile in Ariba Guided Buying Tool
6 pages
Annual Price Increase Rate 2016 and Price Datasheet F Correct
No ratings yet
Annual Price Increase Rate 2016 and Price Datasheet F Correct
4 pages
AW M12 Manual G04 150301 PDF
No ratings yet
AW M12 Manual G04 150301 PDF
24 pages
1er Devoir Du 1er Semestre Anglais 2nde CD 2021-2022 Ceg1 Zogbodomey
No ratings yet
1er Devoir Du 1er Semestre Anglais 2nde CD 2021-2022 Ceg1 Zogbodomey
2 pages
SALES CONTRACT ERIGO - QATAR - Id.en
No ratings yet
SALES CONTRACT ERIGO - QATAR - Id.en
4 pages
Proceduer of Pmi - Rev - 03 Dec 25-2019
No ratings yet
Proceduer of Pmi - Rev - 03 Dec 25-2019
14 pages
Building The Innovative Organization (Part 1) (Week 5) : Bachelor of Business Management (Hons)
No ratings yet
Building The Innovative Organization (Part 1) (Week 5) : Bachelor of Business Management (Hons)
39 pages
2020-06-10 City Council Memo Planning and Community Development Reorganization-REVISED PDF
No ratings yet
2020-06-10 City Council Memo Planning and Community Development Reorganization-REVISED PDF
19 pages
Research Proposal Hilma
No ratings yet
Research Proposal Hilma
14 pages
Example Simple Present
100% (1)
Example Simple Present
8 pages
4 Chemical Kinetics-Notes
No ratings yet
4 Chemical Kinetics-Notes
8 pages
And Displays: Structural Design, Manufacturing and Communication For Packaging
No ratings yet
And Displays: Structural Design, Manufacturing and Communication For Packaging
12 pages
Pajero III Rear Differential Lock
100% (2)
Pajero III Rear Differential Lock
8 pages
Chapter 5 E Commerce
No ratings yet
Chapter 5 E Commerce
43 pages
Modelling of Mobile Robot Dynamics
No ratings yet
Modelling of Mobile Robot Dynamics
9 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Neenopal Data Analysis Task 2

Uploaded by

Neenopal Data Analysis Task 2

Uploaded by

Neenopal Task

Task 1 - Reading the csv File

In [ ]: pip install sqlalchemy

In [2]: import numpy as np

In [3]: used_bikes = pd.read_csv(r"E:\Excel\Neenopal\used_bikes.csv")

Out[4]: bike_name price city kms_driven owner age power brand

Royal Enfield Classic Desert Storm Second Royal

Memory Usage before changing data types

Sum of memory usage before changing data types

In [8]: memory_usage_before_sum ## Sum of memory used by each column

Task 2 Changing Data Types

In [11]: used_bikes["price"] = used_bikes["price"].astype(int)

In [13]: memory_usage_after = used_bikes.memory_usage(deep=True)

In [14]: memory_usage_after_sum = used_bikes.memory_usage(deep=True).sum()

Task 3 - Dumping Data into Sql

In [17]: database_connection = create_engine("mysql+pymysql://root:Yaadnhihai_1629@localhost/neen

In [18]: connection = database_connection.connect()

In [19]: table_name = used_bikes

In [20]: column_name = "bike_name"

In [21]: # Delete the existing index

Out[22]: bike_name price city kms_driven owner age power brand

In [23]: used_bikes.to_sql('used_bikes', connection, if_exists='replace', index=False)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.