0% found this document useful (0 votes)
7 views10 pages

Pandas Datetime4

The document provides a series of Python code snippets demonstrating the use of the pandas library for date and time manipulation, including creating date ranges, extracting date features, and handling timestamps. It also includes a section on reading a CSV file containing COVID-19 data and processing it by dropping unnecessary columns. The code illustrates how to work with datetime objects and perform basic data analysis tasks.

Uploaded by

haridivya6650
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views10 pages

Pandas Datetime4

The document provides a series of Python code snippets demonstrating the use of the pandas library for date and time manipulation, including creating date ranges, extracting date features, and handling timestamps. It also includes a section on reading a CSV file containing COVID-19 data and processing it by dropping unnecessary columns. The code illustrates how to work with datetime objects and perform basic data analysis tasks.

Uploaded by

haridivya6650
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

In [1]: import pandas as pd

import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

In [2]: # Create dates dataframe with frequency


## 10 hours starting with midnight Jan 1st, 2011
data = pd.date_range('1/1/2011', periods = 10, freq ='H')

data

DatetimeIndex(['2011-01-01 00:00:00', '2011-01-01 01:00:00',


Out[2]:
'2011-01-01 02:00:00', '2011-01-01 03:00:00',
'2011-01-01 04:00:00', '2011-01-01 05:00:00',
'2011-01-01 06:00:00', '2011-01-01 07:00:00',
'2011-01-01 08:00:00', '2011-01-01 09:00:00'],
dtype='datetime64[ns]', freq='H')

In [3]: # Create date and time with dataframe


## 10 hours starting with midnight Jan 1st, 2011
data = pd.date_range('1/1/2011', periods = 10, freq ='H')

x = pd.datetime.now()
x.month, x.year

(9, 2023)
Out[3]:

In [4]: #3: Break date and time into separate features


# Create date and time with dataframe
rng = pd.DataFrame()
rng['date'] = pd.date_range('1/1/2011', periods = 72, freq ='H')

# Print the dates in dd-mm-yy format


rng[:5]

# Create features for year, month, day, hour, and minute


rng['year'] = rng['date'].dt.year
rng['month'] = rng['date'].dt.month
rng['day'] = rng['date'].dt.day
rng['hour'] = rng['date'].dt.hour
rng['minute'] = rng['date'].dt.minute

# Print the dates divided into features


rng.head(3)

Out[4]: date year month day hour minute

0 2011-01-01 00:00:00 2011 1 1 0 0

1 2011-01-01 01:00:00 2011 1 1 1 0

2 2011-01-01 02:00:00 2011 1 1 2 0

In [5]: #4: To get the present time, use Timestamp.now() and then convert timestamp to datetime

In [6]: # Input present datetime using Timestamp


from pandas import Timestamp
t = pd.Timestamp.now()
t

Timestamp('2023-09-27 11:25:05.441747')
Out[6]:
In [7]: # Convert timestamp to datetime
t.to_pydatetime()

datetime.datetime(2023, 9, 27, 11, 25, 5, 441747)


Out[7]:

In [8]: t.year
2023
Out[8]:

In [9]: t.month
9
Out[9]:

In [10]: t.day
27
Out[10]:

In [11]: t.hour

11
Out[11]:

In [12]: t.minute

25
Out[12]:

In [13]: t.second
5
Out[13]:

In [14]: import pandas as pd


# read csv file
df = pd.read_csv('covid-19-all.csv')
df.tail(25)

Out[14]: Country/Region Province/State Latitude Longitude Confirmed Recovered Deaths Date

United Arab 2020-


1241927 NaN 23.424076 53.847818 207822.0 184442.0 669.0
Emirates 12-31

2020-
1241928 United Kingdom Anguilla 18.220600 -63.068600 13.0 12.0 0.0
12-31

2020-
1241929 United Kingdom Bermuda 32.307800 -64.750500 604.0 445.0 10.0
12-31

2020-
1241930 United Kingdom British Virgin Islands 18.420700 -64.640000 86.0 74.0 1.0
12-31

2020-
1241931 United Kingdom Cayman Islands 19.313300 -81.254600 338.0 294.0 2.0
12-31

2020-
1241932 United Kingdom Channel Islands 49.372300 -2.364400 3058.0 2256.0 58.0
12-31

2020-
1241933 United Kingdom England 52.355500 -1.174300 2139956.0 0.0 64118.0
12-31

Falkland Islands 2020-


1241934 United Kingdom -51.796300 -59.523600 29.0 17.0 0.0
(Malvinas) 12-31

2020-
1241935 United Kingdom Gibraltar 36.140800 -5.353600 2040.0 1238.0 7.0
12-31
1241936 United Kingdom Isle of Man 54.236100 -4.548100 377.0 348.0 25.0 2020-
12-31

2020-
1241937 United Kingdom Montserrat 16.742498 -62.187366 13.0 12.0 1.0
12-31

2020-
1241938 United Kingdom Northern Ireland 54.787700 -6.492300 72834.0 0.0 1322.0
12-31

2020-
1241939 United Kingdom Scotland 56.490700 -4.202600 127453.0 0.0 4578.0
12-31

Turks and Caicos 2020-


1241940 United Kingdom 21.694000 -71.797900 893.0 783.0 6.0
Islands 12-31

2020-
1241941 United Kingdom Unknown 32.307800 -59.523600 0.0 0.0 0.0
12-31

2020-
1241942 United Kingdom Wales 52.130700 -3.783700 148537.0 0.0 3494.0
12-31

2020-
1241943 Uruguay NaN -32.522800 -55.765800 19119.0 13468.0 181.0
12-31

2020-
1241944 Uzbekistan NaN 41.377491 64.585262 77060.0 74943.0 614.0
12-31

2020-
1241945 Vanuatu NaN -15.376700 166.959200 1.0 1.0 0.0
12-31

2020-
1241946 Venezuela NaN 6.423800 -66.589700 113558.0 107583.0 1028.0
12-31

2020-
1241947 Vietnam NaN 14.058324 108.277199 1465.0 1325.0 35.0
12-31

West Bank and 2020-


1241948 NaN 31.952200 35.233200 138004.0 117183.0 1400.0
Gaza 12-31

2020-
1241949 Yemen NaN 15.552727 48.516388 2099.0 1394.0 610.0
12-31

2020-
1241950 Zambia NaN -13.133897 27.849332 20725.0 18660.0 388.0
12-31

2020-
1241951 Zimbabwe NaN -19.015438 29.154857 13867.0 11250.0 363.0
12-31

In [15]: df=df.drop(columns=['Province/State','Latitude','Longitude'])
df.head()

Out[15]: Country/Region Confirmed Recovered Deaths Date

0 NaN 51526.0 41727.0 2191.0 2021-01-01

1 NaN 58316.0 33634.0 1181.0 2021-01-01

2 NaN 99897.0 67395.0 2762.0 2021-01-01

3 NaN 8117.0 7463.0 84.0 2021-01-01

4 NaN 17568.0 11146.0 405.0 2021-01-01

In [16]: df.Date.unique()
array(['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04',
Out[16]:
'2021-01-05', '2021-01-06', '2021-01-07', '2021-01-08',
'2021-01-09', '2021-01-10', '2021-01-11', '2021-01-12',
'2021-01-13', '2021-01-14', '2021-01-15', '2021-01-16',
'2021-01-17', '2021-01-18', '2021-01-19', '2021-01-20',
'2021-01-21', '2020-01-22', '2021-01-22', '2020-01-23',
'2021-01-23', '2020-01-24', '2021-01-24', '2020-01-25',
'2021-01-25', '2020-01-26', '2021-01-26', '2020-01-27',
'2021-01-27', '2020-01-28', '2021-01-28', '2020-01-29',
'2021-01-29', '2020-01-30', '2021-01-30', '2020-01-31',
'2021-01-31', '2020-02-01', '2021-02-01', '2020-02-02',
'2021-02-02', '2020-02-03', '2021-02-03', '2020-02-04',
'2021-02-04', '2020-02-05', '2021-02-05', '2020-02-06',
'2021-02-06', '2020-02-07', '2021-02-07', '2020-02-08',
'2021-02-08', '2020-02-09', '2021-02-09', '2020-02-10',
'2021-02-10', '2020-02-11', '2021-02-11', '2020-02-12',
'2021-02-12', '2020-02-13', '2020-02-14', '2020-02-15',
'2020-02-16', '2020-02-17', '2020-02-18', '2020-02-19',
'2020-02-20', '2020-02-21', '2020-02-22', '2020-02-23',
'2020-02-24', '2020-02-25', '2020-02-26', '2020-02-27',
'2020-02-28', '2020-02-29', '2020-03-01', '2020-03-02',
'2020-03-03', '2020-03-04', '2020-03-05', '2020-03-06',
'2020-03-07', '2020-03-08', '2020-03-09', '2020-03-10',
'2020-03-11', '2020-03-12', '2020-03-13', '2020-03-14',
'2020-03-15', '2020-03-16', '2020-03-17', '2020-03-18',
'2020-03-19', '2020-03-20', '2020-03-21', '2020-03-22',
'2020-03-23', '2020-03-24', '2020-03-25', '2020-03-26',
'2020-03-27', '2020-03-28', '2020-03-29', '2020-03-30',
'2020-03-31', '2020-04-01', '2020-04-02', '2020-04-03',
'2020-04-04', '2020-04-05', '2020-04-06', '2020-04-07',
'2020-04-08', '2020-04-09', '2020-04-10', '2020-04-11',
'2020-04-12', '2020-04-13', '2020-04-14', '2020-04-15',
'2020-04-16', '2020-04-17', '2020-04-18', '2020-04-19',
'2020-04-20', '2020-04-21', '2020-04-22', '2020-04-23',
'2020-04-24', '2020-04-25', '2020-04-26', '2020-04-27',
'2020-04-28', '2020-04-29', '2020-04-30', '2020-05-01',
'2020-05-02', '2020-05-03', '2020-05-04', '2020-05-05',
'2020-05-06', '2020-05-07', '2020-05-08', '2020-05-09',
'2020-05-10', '2020-05-11', '2020-05-12', '2020-05-13',
'2020-05-14', '2020-05-15', '2020-05-16', '2020-05-17',
'2020-05-18', '2020-05-19', '2020-05-20', '2020-05-21',
'2020-05-22', '2020-05-23', '2020-05-24', '2020-05-25',
'2020-05-26', '2020-05-27', '2020-05-28', '2020-05-29',
'2020-05-30', '2020-05-31', '2020-06-01', '2020-06-02',
'2020-06-03', '2020-06-04', '2020-06-05', '2020-06-06',
'2020-06-07', '2020-06-08', '2020-06-09', '2020-06-10',
'2020-06-11', '2020-06-12', '2020-06-13', '2020-06-14',
'2020-06-15', '2020-06-16', '2020-06-17', '2020-06-18',
'2020-06-19', '2020-06-20', '2020-06-21', '2020-06-22',
'2020-06-23', '2020-06-24', '2020-06-25', '2020-06-26',
'2020-06-27', '2020-06-28', '2020-06-29', '2020-06-30',
'2020-07-01', '2020-07-02', '2020-07-03', '2020-07-04',
'2020-07-05', '2020-07-06', '2020-07-07', '2020-07-08',
'2020-07-09', '2020-07-10', '2020-07-11', '2020-07-12',
'2020-07-13', '2020-07-14', '2020-07-15', '2020-07-16',
'2020-07-17', '2020-07-18', '2020-07-19', '2020-07-20',
'2020-07-21', '2020-07-22', '2020-07-23', '2020-07-24',
'2020-07-25', '2020-07-26', '2020-07-27', '2020-07-28',
'2020-07-29', '2020-07-30', '2020-07-31', '2020-08-01',
'2020-08-02', '2020-08-03', '2020-08-04', '2020-08-05',
'2020-08-06', '2020-08-07', '2020-08-08', '2020-08-09',
'2020-08-10', '2020-08-11', '2020-08-12', '2020-08-13',
'2020-08-14', '2020-08-15', '2020-08-16', '2020-08-17',
'2020-08-18', '2020-08-19', '2020-08-20', '2020-08-21',
'2020-08-22', '2020-08-23', '2020-08-24', '2020-08-25',
'2020-08-26', '2020-08-27', '2020-08-28', '2020-08-29',
'2020-08-30', '2020-08-31', '2020-09-01', '2020-09-02',
'2020-09-03', '2020-09-04', '2020-09-05', '2020-09-06',
'2020-09-07', '2020-09-08', '2020-09-09', '2020-09-10',
'2020-09-11', '2020-09-12', '2020-09-13', '2020-09-14',
'2020-09-15', '2020-09-16', '2020-09-17', '2020-09-18',
'2020-09-19', '2020-09-20', '2020-09-21', '2020-09-22',
'2020-09-23', '2020-09-24', '2020-09-25', '2020-09-26',
'2020-09-27', '2020-09-28', '2020-09-29', '2020-09-30',
'2020-10-01', '2020-10-02', '2020-10-03', '2020-10-04',
'2020-10-05', '2020-10-06', '2020-10-07', '2020-10-08',
'2020-10-09', '2020-10-10', '2020-10-11', '2020-10-12',
'2020-10-13', '2020-10-14', '2020-10-15', '2020-10-16',
'2020-10-17', '2020-10-18', '2020-10-19', '2020-10-20',
'2020-10-21', '2020-10-22', '2020-10-23', '2020-10-24',
'2020-10-25', '2020-10-26', '2020-10-27', '2020-10-28',
'2020-10-29', '2020-10-30', '2020-10-31', '2020-11-01',
'2020-11-02', '2020-11-03', '2020-11-04', '2020-11-05',
'2020-11-06', '2020-11-07', '2020-11-08', '2020-11-09',
'2020-11-10', '2020-11-11', '2020-11-12', '2020-11-13',
'2020-11-14', '2020-11-15', '2020-11-16', '2020-11-17',
'2020-11-18', '2020-11-19', '2020-11-20', '2020-11-21',
'2020-11-22', '2020-11-23', '2020-11-24', '2020-11-25',
'2020-11-26', '2020-11-27', '2020-11-28', '2020-11-29',
'2020-11-30', '2020-12-01', '2020-12-02', '2020-12-03',
'2020-12-04', '2020-12-05', '2020-12-06', '2020-12-07',
'2020-12-08', '2020-12-09', '2020-12-10', '2020-12-11',
'2020-12-12', '2020-12-13', '2020-12-14', '2020-12-15',
'2020-12-16', '2020-12-17', '2020-12-18', '2020-12-19',
'2020-12-20', '2020-12-21', '2020-12-22', '2020-12-23',
'2020-12-24', '2020-12-25', '2020-12-26', '2020-12-27',
'2020-12-28', '2020-12-29', '2020-12-30', '2020-12-31'],
dtype=object)

In [17]: df['Date']=pd.to_datetime(df.Date)
df.head()

Out[17]: Country/Region Confirmed Recovered Deaths Date

0 NaN 51526.0 41727.0 2191.0 2021-01-01

1 NaN 58316.0 33634.0 1181.0 2021-01-01

2 NaN 99897.0 67395.0 2762.0 2021-01-01

3 NaN 8117.0 7463.0 84.0 2021-01-01

4 NaN 17568.0 11146.0 405.0 2021-01-01

In [18]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1241952 entries, 0 to 1241951
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Country/Region 1070891 non-null object
1 Confirmed 1241933 non-null float64
2 Recovered 1241566 non-null float64
3 Deaths 1241520 non-null float64
4 Date 1241952 non-null datetime64[ns]
dtypes: datetime64[ns](1), float64(3), object(1)
memory usage: 47.4+ MB

In [19]: #extracting year,month,day


df['year']=pd.DatetimeIndex(df.Date).year
df['month']=pd.DatetimeIndex(df.Date).month
df['day']=pd.DatetimeIndex(df.Date).day
df.head()
Out[19]: Country/Region Confirmed Recovered Deaths Date year month day

0 NaN 51526.0 41727.0 2191.0 2021-01-01 2021 1 1

1 NaN 58316.0 33634.0 1181.0 2021-01-01 2021 1 1

2 NaN 99897.0 67395.0 2762.0 2021-01-01 2021 1 1

3 NaN 8117.0 7463.0 84.0 2021-01-01 2021 1 1

4 NaN 17568.0 11146.0 405.0 2021-01-01 2021 1 1

In [20]: df.month.replace({
1:'January',
2:'February',
3:'March',
4:'April',
5:'May',
6:'June',
7:'July',
8:'August',
9:'September',
10:'October',
11:'November',
12:'December'
},inplace=True)

df.sample(10)

Out[20]: Country/Region Confirmed Recovered Deaths Date year month day

470398 Ukraine 323.0 219.0 7.0 2020-06-18 2020 June 18

1071712 US 40.0 0.0 0.0 2020-11-19 2020 November 19

1230639 Sudan 25060.0 13524.0 1468.0 2020-12-29 2020 December 29

1108078 US 6041.0 0.0 122.0 2020-11-28 2020 November 28

637690 US 123.0 0.0 7.0 2020-08-01 2020 August 1

477983 France 505.0 460.0 1.0 2020-06-20 2020 June 20

443514 US 43.0 0.0 0.0 2020-06-11 2020 June 11

1173596 US 2225.0 0.0 17.0 2020-12-14 2020 December 14

903265 US 221.0 0.0 8.0 2020-10-07 2020 October 7

146967 NaN 4193.0 0.0 57.0 2021-02-06 2021 February 6

EDA
rename column Country/Region to Region

In [21]: df.rename(columns={'Country/Region':'Region'},inplace=True)
df.head()

Out[21]: Region Confirmed Recovered Deaths Date year month day

0 NaN 51526.0 41727.0 2191.0 2021-01-01 2021 January 1

1 NaN 58316.0 33634.0 1181.0 2021-01-01 2021 January 1


2 NaN 99897.0 67395.0 2762.0 2021-01-01 2021 January 1

3 NaN 8117.0 7463.0 84.0 2021-01-01 2021 January 1

4 NaN 17568.0 11146.0 405.0 2021-01-01 2021 January 1

count of persons who recovered in january

In [22]: df.query('month=="January"')['Recovered'].sum()

1614355505.0
Out[22]:

count of persons who Confirmed in march

In [23]: df.query('month=="March"')['Confirmed'].sum()

8900596.0
Out[23]:

count of persons wo confirmed in jan 22

In [24]: df.query('month=="January" & day==22')['Confirmed'].sum()


98204777.0
Out[24]:

count of person who died in february 17

In [25]: df.query('month=="February" & day==17 ')['Deaths'].sum()


1868.0
Out[25]:

count of persons who confirmed in 2020 march

In [26]: df.query('month=="February" & year==2020 ')['Confirmed'].sum()


1671960.0
Out[26]:

total nmber of death from each region

In [27]: tot_death=df.groupby(['Region'])['Deaths'].sum()
tot_death

Region
Out[27]:
Afghanistan 297323.0
Albania 88375.0
Algeria 377806.0
Andorra 15492.0
Angola 36818.0
...
Vietnam 4908.0
West Bank and Gaza 74931.0
Yemen 108944.0
Zambia 53042.0
Zimbabwe 36683.0
Name: Deaths, Length: 218, dtype: float64

In [28]: df.columns
Index(['Region', 'Confirmed', 'Recovered', 'Deaths', 'Date', 'year', 'month',
Out[28]:
'day'],
dtype='object')

In [29]: tot_death.max()

46511368.0
Out[29]:

In [30]: tot_death.mean()

1024449.7889908256
Out[30]:

total death in china

In [31]: df.query('Region=="China"')['Deaths'].sum()

1415355.0
Out[31]:

no of people who recovered from india

In [32]: df.query('Region=="India"')['Recovered'].sum

<bound method NDFrame._add_numeric_operations.<locals>.sum of 115785 NaN


Out[32]:
119825 NaN
123860 0.0
127899 0.0
131943 0.0
...
1238255 32751.0
1238256 0.0
1238257 562459.0
1238258 84149.0
1238259 528829.0
Name: Recovered, Length: 7669, dtype: float64>

mean and sum of persons who confirmed from each region

In [33]: mean_sum=df.groupby(['Region'])['Confirmed'].agg(['mean','sum'])
mean_sum

Out[33]: mean sum

Region

Afghanistan 27197.250000 8485542.0

Albania 12508.536913 3727544.0

Algeria 34030.144695 10583375.0

Andorra 2238.760656 682822.0

Angola 4680.871080 1343410.0

... ... ...

Vietnam 637.607558 219337.0

West Bank and Gaza 29876.505119 8753816.0

Yemen 1437.240602 382306.0

Zambia 8421.740484 2433883.0

Zimbabwe 4656.038328 1336283.0


218 rows × 2 columns

mean and sum of persons who recovered from each region

In [34]: mean_sum1=df.groupby(['Region'])['Recovered'].agg(['mean','sum'])
mean_sum1

Out[34]: mean sum

Region

Afghanistan 19225.365385 5998314.0

Albania 6710.644295 1999772.0

Algeria 22967.318328 7142836.0

Andorra 1813.675410 553171.0

Angola 2261.926829 649173.0

... ... ...

Vietnam 542.295522 181669.0

West Bank and Gaza 23037.781570 6750070.0

Yemen 829.165414 220558.0

Zambia 7714.425606 2229469.0

Zimbabwe 3678.376307 1055694.0

218 rows × 2 columns

mean and sum of persons who died from each region

In [35]: mean_sum1=df.groupby(['Region'])['Deaths'].agg(['mean','sum'])
mean_sum1

Out[35]: mean sum

Region

Afghanistan 952.958333 297323.0

Albania 296.560403 88375.0

Algeria 1214.810289 377806.0

Andorra 50.793443 15492.0

Angola 128.285714 36818.0

... ... ...

Vietnam 14.650746 4908.0

West Bank and Gaza 255.737201 74931.0

Yemen 409.563910 108944.0

Zambia 183.536332 53042.0

Zimbabwe 127.815331 36683.0

218 rows × 2 columns


In [36]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1241952 entries, 0 to 1241951
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Region 1070891 non-null object
1 Confirmed 1241933 non-null float64
2 Recovered 1241566 non-null float64
3 Deaths 1241520 non-null float64
4 Date 1241952 non-null datetime64[ns]
5 year 1241952 non-null int64
6 month 1241952 non-null object
7 day 1241952 non-null int64
dtypes: datetime64[ns](1), float64(3), int64(2), object(2)
memory usage: 75.8+ MB

In [37]: #label encoding

In [38]: #### converting object data to numeric

In [39]: #done after visualization

In [40]: from sklearn.preprocessing import LabelEncoder


le=LabelEncoder()
df['Region']=le.fit_transform(df['Region'])
df['month']=le.fit_transform(df['month'])

In [ ]:

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy