0% found this document useful (0 votes)

3 views11 pages

Notes_Uber_Data_analysis_project

The document contains a Python script that processes an Uber dataset using pandas, numpy, and seaborn for data analysis and visualization. It includes steps for data loading, preprocessing (such as handling missing values), and visualizing the data through count plots and line plots. The dataset consists of 1156 entries with various attributes including start and end dates, category, miles traveled, and purpose of the trips.

Uploaded by

rammilansahu94250

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views11 pages

Notes_Uber_Data_analysis_project

Uploaded by

rammilansahu94250

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

11/21/24, 4:44 PM Untitled

In [2]: import pandas as pd

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [4]: dataset = pd.read_csv("UberDataset.csv")

In [6]: dataset

Out[6]: START_DATE END_DATE CATEGORY START STOP MILES PURPO

01-01-2016 01-01- Fort

0 Business Fort Pierce 5.1 Meal/Entert
21:11 2016 21:17 Pierce

01-02-2016 01-02- Fort

1 Business Fort Pierce 5.0 N
01:25 2016 01:37 Pierce

01-02-2016 01-02- Fort

2 Business Fort Pierce 4.8 Errand/Suppl
20:25 2016 20:38 Pierce

01-05-2016 01-05- Fort

3 Business Fort Pierce 4.7 Meeti
17:31 2016 17:45 Pierce

West
01-06-2016 01-06-
4 Business Fort Pierce Palm 63.7 Customer V
14:42 2016 15:49
Beach

... ... ... ... ... ... ...

12/31/2016 12/31/2016 Unknown

1151 Business Kar?chi 3.9 Temporary S
13:24 13:42 Location

12/31/2016 12/31/2016 Unknown Unknown

1152 Business 16.2 Meeti
15:03 15:38 Location Location

12/31/2016 12/31/2016
1153 Business Katunayake Gampaha 6.4 Temporary S
21:32 21:50

12/31/2016 12/31/2016
1154 Business Gampaha Ilukwatta 48.2 Temporary S
22:08 23:51

1155 Totals NaN NaN NaN NaN 12204.7 N

1156 rows × 7 columns

In [8]: dataset.shape

Out[8]: (1156, 7)

In [10]: dataset.info()

file:///C:/Users/swati/Downloads/Untitled.html 1/11
11/21/24, 4:44 PM Untitled

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1156 entries, 0 to 1155
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 START_DATE 1156 non-null object
1 END_DATE 1155 non-null object
2 CATEGORY 1155 non-null object
3 START 1155 non-null object
4 STOP 1155 non-null object
5 MILES 1156 non-null float64
6 PURPOSE 653 non-null object
dtypes: float64(1), object(6)
memory usage: 63.3+ KB

Data Preprocessing
In [15]: dataset['PURPOSE'].fillna("NOT", inplace = True)

C:\Users\swati\AppData\Local\Temp\ipykernel_31136\4083644620.py:1: FutureWarning:
A value is trying to be set on a copy of a DataFrame or Series through chained as
signment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work becau
se the intermediate object on which we are setting values always behaves as a cop
y.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.meth

od({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to pe
rform the operation inplace on the original object.

dataset['PURPOSE'].fillna("NOT", inplace = True)

In [17]: dataset.head()

Out[17]: START_DATE END_DATE CATEGORY START STOP MILES PURPOSE

01-01-2016 01-01-2016 Fort Fort

0 Business 5.1 Meal/Entertain
21:11 21:17 Pierce Pierce

01-02-2016 01-02-2016 Fort Fort

1 Business 5.0 NOT
01:25 01:37 Pierce Pierce

01-02-2016 01-02-2016 Fort Fort

2 Business 4.8 Errand/Supplies
20:25 20:38 Pierce Pierce

01-05-2016 01-05-2016 Fort Fort

3 Business 4.7 Meeting
17:31 17:45 Pierce Pierce

West
01-06-2016 01-06-2016 Fort
4 Business Palm 63.7 Customer Visit
14:42 15:49 Pierce
Beach

In [19]: dataset['START_DATE'] = pd.to_datetime(dataset['START_DATE'], errors = 'coerce')

dataset['END_DATE'] = pd.to_datetime(dataset['END_DATE'], errors = 'coerce')

file:///C:/Users/swati/Downloads/Untitled.html 2/11
11/21/24, 4:44 PM Untitled

In [21]: dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1156 entries, 0 to 1155
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 START_DATE 421 non-null datetime64[ns]
1 END_DATE 420 non-null datetime64[ns]
2 CATEGORY 1155 non-null object
3 START 1155 non-null object
4 STOP 1155 non-null object
5 MILES 1156 non-null float64
6 PURPOSE 1156 non-null object
dtypes: datetime64[ns](2), float64(1), object(4)
memory usage: 63.3+ KB

In [23]: from datetime import datetime

dataset['date'] = pd.DatetimeIndex(dataset['START_DATE']).date
dataset['time'] = pd.DatetimeIndex(dataset['START_DATE']).hour

In [25]: dataset.head()

Out[25]: START_DATE END_DATE CATEGORY START STOP MILES PURPOSE date t

2016-01-
2016-01-01 Fort Fort 2016-
0 01 Business 5.1 Meal/Entertain
21:11:00 Pierce Pierce 01-01
21:17:00

2016-01-
2016-01-02 Fort Fort 2016-
1 02 Business 5.0 NOT
01:25:00 Pierce Pierce 01-02
01:37:00

2016-01-
2016-01-02 Fort Fort 2016-
2 02 Business 4.8 Errand/Supplies
20:25:00 Pierce Pierce 01-02
20:38:00

2016-01-
2016-01-05 Fort Fort 2016-
3 05 Business 4.7 Meeting
17:31:00 Pierce Pierce 01-05
17:45:00

2016-01- West
2016-01-06 Fort 2016-
4 06 Business Palm 63.7 Customer Visit
14:42:00 Pierce 01-06
15:49:00 Beach

In [27]: dataset['day-night'] = pd.cut(x=dataset['time'],bins = [0,10,15,19,24],labels =

In [29]: dataset.head()

file:///C:/Users/swati/Downloads/Untitled.html 3/11
11/21/24, 4:44 PM Untitled

Out[29]:
START_DATE END_DATE CATEGORY START STOP MILES PURPOSE date t

2016-01-
2016-01-01 Fort Fort 2016-
0 01 Business 5.1 Meal/Entertain
21:11:00 Pierce Pierce 01-01
21:17:00

2016-01-
2016-01-02 Fort Fort 2016-
1 02 Business 5.0 NOT
01:25:00 Pierce Pierce 01-02
01:37:00

2016-01-
2016-01-02 Fort Fort 2016-
2 02 Business 4.8 Errand/Supplies
20:25:00 Pierce Pierce 01-02
20:38:00

2016-01-
2016-01-05 Fort Fort 2016-
3 05 Business 4.7 Meeting
17:31:00 Pierce Pierce 01-05
17:45:00

2016-01- West
2016-01-06 Fort 2016-
4 06 Business Palm 63.7 Customer Visit
14:42:00 Pierce 01-06
15:49:00 Beach

In [33]: dataset.dropna(inplace = True)

In [35]: dataset.shape

Out[35]: (413, 10)

Data Visualization
In [46]: plt.figure(figsize=(20,5))

plt.subplot(1,2,1)

sns.countplot(dataset['CATEGORY'])
plt.xticks(rotation =90)

plt.subplot(1,2,2)
sns.countplot(dataset['PURPOSE'])

Out[46]: <Axes: xlabel='count', ylabel='PURPOSE'>

In [48]: sns.countplot(dataset['day-night'])

file:///C:/Users/swati/Downloads/Untitled.html 4/11
11/21/24, 4:44 PM Untitled

Out[48]: <Axes: xlabel='count', ylabel='day-night'>

In [50]: dataset.head()

Out[50]:
START_DATE END_DATE CATEGORY START STOP MILES PURPOSE date t

2016-01-
2016-01-01 Fort Fort 2016-
0 01 Business 5.1 Meal/Entertain
21:11:00 Pierce Pierce 01-01
21:17:00

2016-01-
2016-01-02 Fort Fort 2016-
1 02 Business 5.0 NOT
01:25:00 Pierce Pierce 01-02
01:37:00

2016-01-
2016-01-02 Fort Fort 2016-
2 02 Business 4.8 Errand/Supplies
20:25:00 Pierce Pierce 01-02
20:38:00

2016-01-
2016-01-05 Fort Fort 2016-
3 05 Business 4.7 Meeting
17:31:00 Pierce Pierce 01-05
17:45:00

2016-01- West
2016-01-06 Fort 2016-
4 06 Business Palm 63.7 Customer Visit
14:42:00 Pierce 01-06
15:49:00 Beach

In [52]: dataset['MONTH'] = pd.DatetimeIndex(dataset['START_DATE']).month# START_DATE se

month_label = {1.0: 'Jan', 2.0: 'Feb', 3.0: 'Mar', 4.0: 'April',

5.0: 'May', 6.0: 'June', 7.0: 'July', 8.0: 'Aug',
9.0: 'Sep', 10.0: 'Oct', 11.0: 'Nov', 12.0: 'Dec'} # Months ko st

file:///C:/Users/swati/Downloads/Untitled.html 5/11
11/21/24, 4:44 PM Untitled

dataset["MONTH"] = dataset.MONTH.map(month_label) # Number months ko string name

mon = dataset.MONTH.value_counts(sort=False) # Har month ke counts calculate ka

In [54]: dataset.head()

Out[54]:
START_DATE END_DATE CATEGORY START STOP MILES PURPOSE date t

2016-01-
2016-01-01 Fort Fort 2016-
0 01 Business 5.1 Meal/Entertain
21:11:00 Pierce Pierce 01-01
21:17:00

2016-01-
2016-01-02 Fort Fort 2016-
1 02 Business 5.0 NOT
01:25:00 Pierce Pierce 01-02
01:37:00

2016-01-
2016-01-02 Fort Fort 2016-
2 02 Business 4.8 Errand/Supplies
20:25:00 Pierce Pierce 01-02
20:38:00

2016-01-
2016-01-05 Fort Fort 2016-
3 05 Business 4.7 Meeting
17:31:00 Pierce Pierce 01-05
17:45:00

2016-01- West
2016-01-06 Fort 2016-
4 06 Business Palm 63.7 Customer Visit
14:42:00 Pierce 01-06
15:49:00 Beach

In [58]: df = pd.DataFrame({
"MONTHS": mon.values, # Har month ka total count.
"VALUE COUNT": dataset.groupby('MONTH', sort=False)['MILES'].max() # Har mo
})

p = sns.lineplot(data=df) # Line plot banata hai.

p.set(xlabel="MONTHS", ylabel="VALUE COUNT") # Axis labels set karta ha

Out[58]: [Text(0.5, 0, 'MONTHS'), Text(0, 0.5, 'VALUE COUNT')]

file:///C:/Users/swati/Downloads/Untitled.html 6/11
11/21/24, 4:44 PM Untitled

In [60]: dataset.head()

Out[60]:
START_DATE END_DATE CATEGORY START STOP MILES PURPOSE date t

2016-01-
2016-01-01 Fort Fort 2016-
0 01 Business 5.1 Meal/Entertain
21:11:00 Pierce Pierce 01-01
21:17:00

2016-01-
2016-01-02 Fort Fort 2016-
1 02 Business 5.0 NOT
01:25:00 Pierce Pierce 01-02
01:37:00

2016-01-
2016-01-02 Fort Fort 2016-
2 02 Business 4.8 Errand/Supplies
20:25:00 Pierce Pierce 01-02
20:38:00

2016-01-
2016-01-05 Fort Fort 2016-
3 05 Business 4.7 Meeting
17:31:00 Pierce Pierce 01-05
17:45:00

2016-01- West
2016-01-06 Fort 2016-
4 06 Business Palm 63.7 Customer Visit
14:42:00 Pierce 01-06
15:49:00 Beach

In [64]: dataset['DAY'] = dataset.START_DATE.dt.weekday

day_label = {
0: 'Mon', 1:'Tues', 2:'Wed', 3:'Thur',4:'Fri', 5:'Sat', 6:'Sun'}

dataset['DAY'] = dataset['DAY'].map(day_label)

file:///C:/Users/swati/Downloads/Untitled.html 7/11
11/21/24, 4:44 PM Untitled

In [66]: dataset.head()

Out[66]:
START_DATE END_DATE CATEGORY START STOP MILES PURPOSE date t

2016-01-
2016-01-01 Fort Fort 2016-
0 01 Business 5.1 Meal/Entertain
21:11:00 Pierce Pierce 01-01
21:17:00

2016-01-
2016-01-02 Fort Fort 2016-
1 02 Business 5.0 NOT
01:25:00 Pierce Pierce 01-02
01:37:00

2016-01-
2016-01-02 Fort Fort 2016-
2 02 Business 4.8 Errand/Supplies
20:25:00 Pierce Pierce 01-02
20:38:00

2016-01-
2016-01-05 Fort Fort 2016-
3 05 Business 4.7 Meeting
17:31:00 Pierce Pierce 01-05
17:45:00

2016-01- West
2016-01-06 Fort 2016-
4 06 Business Palm 63.7 Customer Visit
14:42:00 Pierce 01-06
15:49:00 Beach

In [68]: day_label =dataset.DAY.value_counts()

sns.barplot(x=day_label.index, y= day_label)
plt.xlabel('DAY')
plt.ylabel('COUNT')

Out[68]: Text(0, 0.5, 'COUNT')

file:///C:/Users/swati/Downloads/Untitled.html 8/11
11/21/24, 4:44 PM Untitled

In [70]: dataset.head()

Out[70]:
START_DATE END_DATE CATEGORY START STOP MILES PURPOSE date t

2016-01-
2016-01-01 Fort Fort 2016-
0 01 Business 5.1 Meal/Entertain
21:11:00 Pierce Pierce 01-01
21:17:00

2016-01-
2016-01-02 Fort Fort 2016-
1 02 Business 5.0 NOT
01:25:00 Pierce Pierce 01-02
01:37:00

2016-01-
2016-01-02 Fort Fort 2016-
2 02 Business 4.8 Errand/Supplies
20:25:00 Pierce Pierce 01-02
20:38:00

2016-01-
2016-01-05 Fort Fort 2016-
3 05 Business 4.7 Meeting
17:31:00 Pierce Pierce 01-05
17:45:00

2016-01- West
2016-01-06 Fort 2016-
4 06 Business Palm 63.7 Customer Visit
14:42:00 Pierce 01-06
15:49:00 Beach

In [74]: sns.boxplot(dataset['MILES'])

Out[74]: <Axes: ylabel='MILES'>

In [78]: sns.boxplot(dataset[dataset['MILES']<100]['MILES'])

Out[78]: <Axes: ylabel='MILES'>

file:///C:/Users/swati/Downloads/Untitled.html 9/11
11/21/24, 4:44 PM Untitled

In [82]: sns.boxplot(dataset[dataset['MILES']<40]['MILES'])

Out[82]: <Axes: ylabel='MILES'>

In [86]: sns.distplot(dataset[dataset['MILES']<40]['MILES'])

file:///C:/Users/swati/Downloads/Untitled.html 10/11
11/21/24, 4:44 PM Untitled

C:\Users\swati\AppData\Local\Temp\ipykernel_31136\1678554178.py:1: UserWarning:

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

sns.distplot(dataset[dataset['MILES']<40]['MILES'])
Out[86]: <Axes: xlabel='MILES', ylabel='Density'>

In [ ]:

file:///C:/Users/swati/Downloads/Untitled.html 11/11

dev record final (3)
No ratings yet
dev record final (3)
34 pages
Pandas Data Manipulation Extended CheatSheet 1731972219
No ratings yet
Pandas Data Manipulation Extended CheatSheet 1731972219
9 pages
porter-case-study
No ratings yet
porter-case-study
153 pages
1.5
No ratings yet
1.5
39 pages
Merged
No ratings yet
Merged
47 pages
019) Pandas - Batch 2 - Day 019 (FINAL DAY)
No ratings yet
019) Pandas - Batch 2 - Day 019 (FINAL DAY)
43 pages
Group-3 Report
No ratings yet
Group-3 Report
38 pages
Week 10 Intro Time Series
No ratings yet
Week 10 Intro Time Series
34 pages
2016MIS013
No ratings yet
2016MIS013
36 pages
Pandas Fuction Notes
No ratings yet
Pandas Fuction Notes
3 pages
Supervised Regression
No ratings yet
Supervised Regression
24 pages
Week 10 Intro Forecasting
No ratings yet
Week 10 Intro Forecasting
25 pages
Lesson - 3 - 1 Data Wrangling
No ratings yet
Lesson - 3 - 1 Data Wrangling
29 pages
report
No ratings yet
report
25 pages
DEV RECORD AIDS
No ratings yet
DEV RECORD AIDS
24 pages
Acknowledgement
No ratings yet
Acknowledgement
25 pages
Vertopal.com Outlook Module3 (1)
No ratings yet
Vertopal.com Outlook Module3 (1)
21 pages
Yash Week 3 Uber Case Study
No ratings yet
Yash Week 3 Uber Case Study
38 pages
ML Practical 1
No ratings yet
ML Practical 1
15 pages
lab record dev
No ratings yet
lab record dev
20 pages
Uber - Rides - Analysis - Jupyter Notebook
No ratings yet
Uber - Rides - Analysis - Jupyter Notebook
12 pages
AIML
No ratings yet
AIML
13 pages
Lab Exercise 2-CS0017
No ratings yet
Lab Exercise 2-CS0017
17 pages
Complete Case Analysis (CCA) : Advantages
No ratings yet
Complete Case Analysis (CCA) : Advantages
6 pages
Research Methodology Techniques and Trends (Umesh Kumar Bhayyalal Dubey, D. P. Kothari) (Z-Library) Compressed
No ratings yet
Research Methodology Techniques and Trends (Umesh Kumar Bhayyalal Dubey, D. P. Kothari) (Z-Library) Compressed
307 pages
Loading The Dataset: First We Load The Dataset and Find Out The Number of Columns, Rows, NULL Values, Etc
100% (1)
Loading The Dataset: First We Load The Dataset and Find Out The Number of Columns, Rows, NULL Values, Etc
8 pages
ML Practical 1
No ratings yet
ML Practical 1
15 pages
Python For Business Decision Making Asm2
No ratings yet
Python For Business Decision Making Asm2
21 pages
Ankita Jupyternotebook PDF
No ratings yet
Ankita Jupyternotebook PDF
13 pages
vertopal.com_Delhivery
No ratings yet
vertopal.com_Delhivery
20 pages
Co Digit Ooo
No ratings yet
Co Digit Ooo
15 pages
DOC-20241028-WA0016.
No ratings yet
DOC-20241028-WA0016.
13 pages
ML 1 16
No ratings yet
ML 1 16
13 pages
Code
No ratings yet
Code
2 pages
Code With Dates HARDCODED
No ratings yet
Code With Dates HARDCODED
2 pages
2777959-Day 8 - Data Wrangling
No ratings yet
2777959-Day 8 - Data Wrangling
2 pages
Data Science Notes Unit-1 Part -2
No ratings yet
Data Science Notes Unit-1 Part -2
22 pages
Exploratry Data Analysis of The Telecom Customer Churn
No ratings yet
Exploratry Data Analysis of The Telecom Customer Churn
16 pages
Name: Siddhesh Asati: #Group: B (ML) #Assignment: 6
No ratings yet
Name: Siddhesh Asati: #Group: B (ML) #Assignment: 6
9 pages
Lab1.ipynb - Colaboratory
No ratings yet
Lab1.ipynb - Colaboratory
9 pages
Bitwise cst383 Final Project
No ratings yet
Bitwise cst383 Final Project
17 pages
Doc3_merged
No ratings yet
Doc3_merged
16 pages
Siddhesh Asati: #Group: B (ML)
No ratings yet
Siddhesh Asati: #Group: B (ML)
9 pages
FDA_E0323040_20_12_24
No ratings yet
FDA_E0323040_20_12_24
4 pages
practice_questions2
No ratings yet
practice_questions2
2 pages
FDS Notes Unit-4
No ratings yet
FDS Notes Unit-4
30 pages
Practical 1
No ratings yet
Practical 1
6 pages
P1) Code Uber
No ratings yet
P1) Code Uber
6 pages
Riya_2412res102@Iitp.ac.in.ipynb - Colab
No ratings yet
Riya_2412res102@Iitp.ac.in.ipynb - Colab
3 pages
27_ML_A1
No ratings yet
27_ML_A1
9 pages
Pandas Roadmap
No ratings yet
Pandas Roadmap
6 pages
Institute of Technology Management & Research
No ratings yet
Institute of Technology Management & Research
10 pages
lab 1 ML lab
No ratings yet
lab 1 ML lab
15 pages
Task 1 Vijaya Lakshman PDF
No ratings yet
Task 1 Vijaya Lakshman PDF
10 pages
Online Sales Data Analysis
No ratings yet
Online Sales Data Analysis
9 pages
DMV - 1 - Jupyter Notebook
No ratings yet
DMV - 1 - Jupyter Notebook
4 pages
Important Pandas Operations 1697910759
No ratings yet
Important Pandas Operations 1697910759
6 pages
ML Practical 1 Code
100% (1)
ML Practical 1 Code
1 page
Uber Drive Practice DP PDF
No ratings yet
Uber Drive Practice DP PDF
10 pages
Manual Audit Sampling: History
No ratings yet
Manual Audit Sampling: History
15 pages
Final Educ 107 Unit 4 ANALYSIS AND INTERPRETATION OF ASSESSMENT RESULTS
No ratings yet
Final Educ 107 Unit 4 ANALYSIS AND INTERPRETATION OF ASSESSMENT RESULTS
32 pages
Working With Dates in Pandas: Prepared by Asif Bhat
No ratings yet
Working With Dates in Pandas: Prepared by Asif Bhat
13 pages
Q3_Wk1_Module1_Research-1_SSPElective
No ratings yet
Q3_Wk1_Module1_Research-1_SSPElective
25 pages
JB Ies 109 Exercises Answers
No ratings yet
JB Ies 109 Exercises Answers
246 pages
EBS-ECC-April-2024-V12-Launch-RECORDED_S1107121GC10
No ratings yet
EBS-ECC-April-2024-V12-Launch-RECORDED_S1107121GC10
44 pages
Reliability
No ratings yet
Reliability
33 pages
Earnings Quality and Market Values of Selected Listed Manufacturing Companies in Nigeria
No ratings yet
Earnings Quality and Market Values of Selected Listed Manufacturing Companies in Nigeria
16 pages
Discretioary Revenues As A Measure of Earnings Management
No ratings yet
Discretioary Revenues As A Measure of Earnings Management
24 pages
Unit+7+-+Practice+Multiple+Choice+Questions
No ratings yet
Unit+7+-+Practice+Multiple+Choice+Questions
12 pages
Download full Research Methods and Statistics for Public and Nonprofit Administrators A Practical Guide First Edition Masami Nishishiba ebook all chapters
100% (1)
Download full Research Methods and Statistics for Public and Nonprofit Administrators A Practical Guide First Edition Masami Nishishiba ebook all chapters
65 pages
Breast Cancer Classification
100% (2)
Breast Cancer Classification
16 pages
BUAN6359-Fall2024 HW3
No ratings yet
BUAN6359-Fall2024 HW3
3 pages
PR2 - Group 4 CHAPTER 13
No ratings yet
PR2 - Group 4 CHAPTER 13
26 pages
Research Basics
No ratings yet
Research Basics
41 pages
Why Cyber Insurance Needs Probabilisticand Statistical Cyberrisk Assessments More Than Everjoa Eng 0318
No ratings yet
Why Cyber Insurance Needs Probabilisticand Statistical Cyberrisk Assessments More Than Everjoa Eng 0318
10 pages
CORE Noted
No ratings yet
CORE Noted
3 pages
Assignment Due August 10 2019
No ratings yet
Assignment Due August 10 2019
20 pages
Abnormal Audit Fee and Audit Quality: Sharad C. Asthana and Jeff P. Boone
No ratings yet
Abnormal Audit Fee and Audit Quality: Sharad C. Asthana and Jeff P. Boone
22 pages
TD SQL
No ratings yet
TD SQL
45 pages
Static Games With Incomplete Information
No ratings yet
Static Games With Incomplete Information
15 pages
A Complete 52 Week Course To Become A Data Scientist in 2021
No ratings yet
A Complete 52 Week Course To Become A Data Scientist in 2021
13 pages
Tic Tac Toe Choice Board
No ratings yet
Tic Tac Toe Choice Board
1 page
3-2 F Baumeister Presentation Homogeneity in EQA
No ratings yet
3-2 F Baumeister Presentation Homogeneity in EQA
24 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
3 pages
Pearson Product-Moment Correlation: Mr. Ian Anthony M. Torrente, LPT
No ratings yet
Pearson Product-Moment Correlation: Mr. Ian Anthony M. Torrente, LPT
11 pages
Logistic Regression and SGD
No ratings yet
Logistic Regression and SGD
10 pages
Impact of Total Quality Management in Organizational Performance Oil Company Case in Jordan
No ratings yet
Impact of Total Quality Management in Organizational Performance Oil Company Case in Jordan
15 pages
Final Demo Lesson Plan
No ratings yet
Final Demo Lesson Plan
12 pages
MMW
No ratings yet
MMW
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Notes_Uber_Data_analysis_project

Uploaded by

Notes_Uber_Data_analysis_project

Uploaded by

11/21/24, 4:44 PM Untitled

In [2]: import pandas as pd

In [4]: dataset = pd.read_csv("UberDataset.csv")

Out[6]: START_DATE END_DATE CATEGORY START STOP MILES PURPO

01-01-2016 01-01- Fort

01-02-2016 01-02- Fort

01-02-2016 01-02- Fort

01-05-2016 01-05- Fort

... ... ... ... ... ... ...

12/31/2016 12/31/2016 Unknown

12/31/2016 12/31/2016 Unknown Unknown

1155 Totals NaN NaN NaN NaN 12204.7 N

1156 rows × 7 columns

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.meth

dataset['PURPOSE'].fillna("NOT", inplace = True)

Out[17]: START_DATE END_DATE CATEGORY START STOP MILES PURPOSE

01-01-2016 01-01-2016 Fort Fort

01-02-2016 01-02-2016 Fort Fort

01-02-2016 01-02-2016 Fort Fort

01-05-2016 01-05-2016 Fort Fort

In [19]: dataset['START_DATE'] = pd.to_datetime(dataset['START_DATE'], errors = 'coerce')

dataset['END_DATE'] = pd.to_datetime(dataset['END_DATE'], errors = 'coerce')

In [23]: from datetime import datetime

Out[25]: START_DATE END_DATE CATEGORY START STOP MILES PURPOSE date t

In [27]: dataset['day-night'] = pd.cut(x=dataset['time'],bins = [0,10,15,19,24],labels =

In [33]: dataset.dropna(inplace = True)

Out[35]: (413, 10)

Out[46]: <Axes: xlabel='count', ylabel='PURPOSE'>

Out[48]: <Axes: xlabel='count', ylabel='day-night'>

In [52]: dataset['MONTH'] = pd.DatetimeIndex(dataset['START_DATE']).month# START_DATE se

month_label = {1.0: 'Jan', 2.0: 'Feb', 3.0: 'Mar', 4.0: 'April',

dataset["MONTH"] = dataset.MONTH.map(month_label) # Number months ko string name

mon = dataset.MONTH.value_counts(sort=False) # Har month ke counts calculate ka

p = sns.lineplot(data=df) # Line plot banata hai.

Out[58]: [Text(0.5, 0, 'MONTHS'), Text(0, 0.5, 'VALUE COUNT')]

In [64]: dataset['DAY'] = dataset.START_DATE.dt.weekday

In [68]: day_label =dataset.DAY.value_counts()

Out[68]: Text(0, 0.5, 'COUNT')

Out[74]: <Axes: ylabel='MILES'>

Out[78]: <Axes: ylabel='MILES'>

Out[82]: <Axes: ylabel='MILES'>

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.