Neenopal Data Analysis Task 2
Neenopal Data Analysis Task 2
In [4]: used_bikes.head(30)
0 TVS Star City Plus Dual Tone 110cc 35000.0 Ahmedabad 17654.0 First Owner 3.0 110.0 TVS
Royal
1 Royal Enfield Classic 350cc 119900.0 Delhi 11000.0 First Owner 4.0 350.0
Enfield
2 Triumph Daytona 675R 600000.0 Delhi 110.0 First Owner 8.0 675.0 Triumph
3 TVS Apache RTR 180cc 65000.0 Bangalore 16329.0 First Owner 4.0 180.0 TVS
4 Yamaha FZ S V 2.0 150cc-Ltd. Edition 80000.0 Bangalore 10000.0 First Owner 3.0 150.0 Yamaha
5 Yamaha FZs 150cc 53499.0 Delhi 25000.0 First Owner 6.0 150.0 Yamaha
6 Honda CB Hornet 160R ABS DLX 85000.0 Delhi 8200.0 First Owner 3.0 160.0 Honda
7 Hero Splendor Plus Self Alloy 100cc 45000.0 Delhi 12645.0 First Owner 3.0 100.0 Hero
Royal
8 Royal Enfield Thunderbird X 350cc 145000.0 Bangalore 9190.0 First Owner 3.0 350.0
Enfield
10 Yamaha YZF-R15 2.0 150cc 72000.0 Bangalore 20000.0 First Owner 7.0 150.0 Yamaha
11 Yamaha FZ25 250cc 95000.0 Bangalore 9665.0 First Owner 4.0 250.0 Yamaha
12 Bajaj Pulsar NS200 78000.0 Bangalore 9900.0 First Owner 4.0 200.0 Bajaj
13 Bajaj Discover 100M 29499.0 Delhi 20000.0 First Owner 8.0 100.0 Bajaj
14 Bajaj Discover 125M 29900.0 Delhi 20000.0 First Owner 7.0 125.0 Bajaj
15 Bajaj Pulsar NS200 ABS 90000.0 Bangalore 11574.0 First Owner 3.0 200.0 Bajaj
16 Bajaj Pulsar RS200 ABS 120000.0 Bangalore 23000.0 First Owner 3.0 200.0 Bajaj
17 Suzuki Gixxer SF 150cc 48000.0 Mumbai 24725.0 First Owner 5.0 150.0 Suzuki
Second
18 Benelli 302R 300CC 240000.0 Mumbai 15025.0 3.0 302.0 Benelli
Owner
19 Bajaj Discover 125M 29900.0 Delhi 20000.0 First Owner 7.0 125.0 Bajaj
20 Bajaj Pulsar RS200 ABS 120000.0 Bangalore 23000.0 First Owner 3.0 200.0 Bajaj
21 Suzuki Gixxer SF 150cc 48000.0 Mumbai 24725.0 First Owner 5.0 150.0 Suzuki
22 Hero Splendor iSmart Plus IBS 110cc 46500.0 Delhi 3500.0 First Owner 2.0 110.0 Hero
Royal
23 Royal Enfield Classic Chrome 500cc 121700.0 Kalyan 24520.0 First Owner 5.0 500.0
Enfield
24 Yamaha FZ V 2.0 150cc 45000.0 Delhi 23000.0 First Owner 6.0 150.0 Yamaha
25 Bajaj Pulsar NS200 78000.0 Bangalore 9900.0 First Owner 4.0 200.0 Bajaj
26 Hero Super Splendor 125cc 20000.0 Ahmedabad 29305.0 First Owner 16.0 125.0 Hero
Second
27 Honda CBF Stunner 125cc 20800.0 Faridabad 30500.0 7.0 125.0 Honda
Owner
28 Bajaj Pulsar 150cc 50000.0 Bangalore 19000.0 First Owner 8.0 150.0 Bajaj
29 Honda X-Blade 160CC ABS 81200.0 Mettur 9100.0 First Owner 2.0 160.0 Honda
In [6]: memory_usage_before
Index 132
Out[6]:
bike_name 11954
price 1192
city 9546
kms_driven 1192
owner 10144
age 1192
power 1192
brand 9517
dtype: int64
In [10]: used_bikes.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 149 entries, 0 to 148
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 bike_name 149 non-null object
1 price 149 non-null float64
2 city 149 non-null object
3 kms_driven 149 non-null float64
4 owner 149 non-null object
5 age 149 non-null float64
6 power 149 non-null float64
7 brand 149 non-null object
dtypes: float64(4), object(4)
memory usage: 9.4+ KB
As we can see The price , kms_driven , age , power are of float type. We can change all of these to int data
type as by analysing the csv file we can see that decimals value are not present.
In [12]: used_bikes.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 149 entries, 0 to 148
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 bike_name 149 non-null object
1 price 149 non-null int32
2 city 149 non-null object
3 kms_driven 149 non-null int32
4 owner 149 non-null object
5 age 149 non-null int32
6 power 149 non-null int32
7 brand 149 non-null object
dtypes: int32(4), object(4)
memory usage: 7.1+ KB
Data type is changed . Now as we can see the memory usage is decreased from 9.4+ kb to 7.1 + kb . We will
also compare it by each column as well as the sum of memory usage by each column.
In [15]: memory_usage_after_sum
43677
Out[15]:
In [16]: memory_usage_after
Index 132
Out[16]:
bike_name 11954
price 596
city 9546
kms_driven 596
owner 10144
age 596
power 596
brand 9517
dtype: int64
As the connection is established I will add the data into the dataset in sql
In [22]: used_bikes.head()
0 TVS Star City Plus Dual Tone 110cc 35000 Ahmedabad 17654 First Owner 3 110 TVS
1 Royal Enfield Classic 350cc 119900 Delhi 11000 First Owner 4 350 Royal Enfield
2 Triumph Daytona 675R 600000 Delhi 110 First Owner 8 675 Triumph
3 TVS Apache RTR 180cc 65000 Bangalore 16329 First Owner 4 180 TVS
4 Yamaha FZ S V 2.0 150cc-Ltd. Edition 80000 Bangalore 10000 First Owner 3 150 Yamaha
Hence The connection is established and data is dumped into sql database. Also existing index is also
deleted
In [24]: try:
connection.execute(f"CREATE INDEX new_index ON used_bikes({column_name}({key_length}
except Exception as e:
print(f"Error creating index: {e}")
finally:
connection.close()
All the tasks are finished . I deleted the index(if existed) at first and then I have created new index and this is
benificial for the daily data dumping. Indexing the dataset is very useful and thus I have used my existing
knowledge on the concept and tried to bring the best out of it.