0% found this document useful (0 votes)
18 views

Neenopal Data Analysis Task 2

Uploaded by

muditchechi03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Neenopal Data Analysis Task 2

Uploaded by

muditchechi03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Neenopal Task

Task 1 - Reading the csv File

In [ ]: pip install sqlalchemy

In [2]: import numpy as np


import pandas as pd
from sqlalchemy import create_engine
import pymysql

In [3]: used_bikes = pd.read_csv(r"E:\Excel\Neenopal\used_bikes.csv")

In [4]: used_bikes.head(30)

Out[4]: bike_name price city kms_driven owner age power brand

0 TVS Star City Plus Dual Tone 110cc 35000.0 Ahmedabad 17654.0 First Owner 3.0 110.0 TVS

Royal
1 Royal Enfield Classic 350cc 119900.0 Delhi 11000.0 First Owner 4.0 350.0
Enfield

2 Triumph Daytona 675R 600000.0 Delhi 110.0 First Owner 8.0 675.0 Triumph

3 TVS Apache RTR 180cc 65000.0 Bangalore 16329.0 First Owner 4.0 180.0 TVS

4 Yamaha FZ S V 2.0 150cc-Ltd. Edition 80000.0 Bangalore 10000.0 First Owner 3.0 150.0 Yamaha

5 Yamaha FZs 150cc 53499.0 Delhi 25000.0 First Owner 6.0 150.0 Yamaha

6 Honda CB Hornet 160R ABS DLX 85000.0 Delhi 8200.0 First Owner 3.0 160.0 Honda

7 Hero Splendor Plus Self Alloy 100cc 45000.0 Delhi 12645.0 First Owner 3.0 100.0 Hero

Royal
8 Royal Enfield Thunderbird X 350cc 145000.0 Bangalore 9190.0 First Owner 3.0 350.0
Enfield

Royal Enfield Classic Desert Storm Second Royal


9 88000.0 Delhi 19000.0 7.0 500.0
500cc Owner Enfield

10 Yamaha YZF-R15 2.0 150cc 72000.0 Bangalore 20000.0 First Owner 7.0 150.0 Yamaha

11 Yamaha FZ25 250cc 95000.0 Bangalore 9665.0 First Owner 4.0 250.0 Yamaha

12 Bajaj Pulsar NS200 78000.0 Bangalore 9900.0 First Owner 4.0 200.0 Bajaj

13 Bajaj Discover 100M 29499.0 Delhi 20000.0 First Owner 8.0 100.0 Bajaj

14 Bajaj Discover 125M 29900.0 Delhi 20000.0 First Owner 7.0 125.0 Bajaj

15 Bajaj Pulsar NS200 ABS 90000.0 Bangalore 11574.0 First Owner 3.0 200.0 Bajaj

16 Bajaj Pulsar RS200 ABS 120000.0 Bangalore 23000.0 First Owner 3.0 200.0 Bajaj

17 Suzuki Gixxer SF 150cc 48000.0 Mumbai 24725.0 First Owner 5.0 150.0 Suzuki

Second
18 Benelli 302R 300CC 240000.0 Mumbai 15025.0 3.0 302.0 Benelli
Owner

19 Bajaj Discover 125M 29900.0 Delhi 20000.0 First Owner 7.0 125.0 Bajaj

20 Bajaj Pulsar RS200 ABS 120000.0 Bangalore 23000.0 First Owner 3.0 200.0 Bajaj

21 Suzuki Gixxer SF 150cc 48000.0 Mumbai 24725.0 First Owner 5.0 150.0 Suzuki
22 Hero Splendor iSmart Plus IBS 110cc 46500.0 Delhi 3500.0 First Owner 2.0 110.0 Hero

Royal
23 Royal Enfield Classic Chrome 500cc 121700.0 Kalyan 24520.0 First Owner 5.0 500.0
Enfield

24 Yamaha FZ V 2.0 150cc 45000.0 Delhi 23000.0 First Owner 6.0 150.0 Yamaha

25 Bajaj Pulsar NS200 78000.0 Bangalore 9900.0 First Owner 4.0 200.0 Bajaj

26 Hero Super Splendor 125cc 20000.0 Ahmedabad 29305.0 First Owner 16.0 125.0 Hero

Second
27 Honda CBF Stunner 125cc 20800.0 Faridabad 30500.0 7.0 125.0 Honda
Owner

28 Bajaj Pulsar 150cc 50000.0 Bangalore 19000.0 First Owner 8.0 150.0 Bajaj

29 Honda X-Blade 160CC ABS 81200.0 Mettur 9100.0 First Owner 2.0 160.0 Honda

Memory Usage before changing data types


In [5]: memory_usage_before = used_bikes.memory_usage(deep=True)

In [6]: memory_usage_before
Index 132
Out[6]:
bike_name 11954
price 1192
city 9546
kms_driven 1192
owner 10144
age 1192
power 1192
brand 9517
dtype: int64

Sum of memory usage before changing data types


In [7]: memory_usage_before_sum = used_bikes.memory_usage(deep=True).sum()

In [8]: memory_usage_before_sum ## Sum of memory used by each column


46061
Out[8]:

Task 2 Changing Data Types


First We will check the Initial Data types of each column

In [10]: used_bikes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 149 entries, 0 to 148
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 bike_name 149 non-null object
1 price 149 non-null float64
2 city 149 non-null object
3 kms_driven 149 non-null float64
4 owner 149 non-null object
5 age 149 non-null float64
6 power 149 non-null float64
7 brand 149 non-null object
dtypes: float64(4), object(4)
memory usage: 9.4+ KB

As we can see The price , kms_driven , age , power are of float type. We can change all of these to int data
type as by analysing the csv file we can see that decimals value are not present.

In [11]: used_bikes["price"] = used_bikes["price"].astype(int)


used_bikes["kms_driven"] = used_bikes["kms_driven"].astype(int)
used_bikes["age"] = used_bikes["age"].astype(int)
used_bikes["power"] = used_bikes["power"].astype(int)

In [12]: used_bikes.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 149 entries, 0 to 148
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 bike_name 149 non-null object
1 price 149 non-null int32
2 city 149 non-null object
3 kms_driven 149 non-null int32
4 owner 149 non-null object
5 age 149 non-null int32
6 power 149 non-null int32
7 brand 149 non-null object
dtypes: int32(4), object(4)
memory usage: 7.1+ KB

Data type is changed . Now as we can see the memory usage is decreased from 9.4+ kb to 7.1 + kb . We will
also compare it by each column as well as the sum of memory usage by each column.

In [13]: memory_usage_after = used_bikes.memory_usage(deep=True)

In [14]: memory_usage_after_sum = used_bikes.memory_usage(deep=True).sum()

In [15]: memory_usage_after_sum

43677
Out[15]:

In [16]: memory_usage_after
Index 132
Out[16]:
bike_name 11954
price 596
city 9546
kms_driven 596
owner 10144
age 596
power 596
brand 9517
dtype: int64

Task 3 - Dumping Data into Sql


There are many techniques to eastbalish a connection between python and mysql . I will be using
sqlalchemy connection technique to estbalish the connection.

In [17]: database_connection = create_engine("mysql+pymysql://root:Yaadnhihai_1629@localhost/neen

In [18]: connection = database_connection.connect()

As the connection is established I will add the data into the dataset in sql

In [19]: table_name = used_bikes

In [20]: column_name = "bike_name"


key_length = 50

In [21]: # Delete the existing index


try:
connection.execute(f"DROP INDEX IF EXISTS {column_name}_index ON {table_name}")
except:
pass

In [22]: used_bikes.head()

Out[22]: bike_name price city kms_driven owner age power brand

0 TVS Star City Plus Dual Tone 110cc 35000 Ahmedabad 17654 First Owner 3 110 TVS

1 Royal Enfield Classic 350cc 119900 Delhi 11000 First Owner 4 350 Royal Enfield

2 Triumph Daytona 675R 600000 Delhi 110 First Owner 8 675 Triumph

3 TVS Apache RTR 180cc 65000 Bangalore 16329 First Owner 4 180 TVS

4 Yamaha FZ S V 2.0 150cc-Ltd. Edition 80000 Bangalore 10000 First Owner 3 150 Yamaha

In [23]: used_bikes.to_sql('used_bikes', connection, if_exists='replace', index=False)


149
Out[23]:

Hence The connection is established and data is dumped into sql database. Also existing index is also
deleted

In [24]: try:
connection.execute(f"CREATE INDEX new_index ON used_bikes({column_name}({key_length}
except Exception as e:
print(f"Error creating index: {e}")
finally:
connection.close()

All the tasks are finished . I deleted the index(if existed) at first and then I have created new index and this is
benificial for the daily data dumping. Indexing the dataset is very useful and thus I have used my existing
knowledge on the concept and tried to bring the best out of it.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy