0% found this document useful (0 votes)

13 views

Exercise 3

Uploaded by

Gayathri Ramasamy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Exercise 3

Uploaded by

Gayathri Ramasamy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 25

Ex. No. : 3.

Working with Pandas data frames

Date :
Aim
To study and work with pandas data frames

Introduction to Pandas:
Import Pandas
Once Pandas is installed, import it in your applications by adding
the import keyword: import pandas
Now Pandas is imported and ready to use.
Example

import pandas
mydataset = {'cars': ["BMW", "Volvo", "Ford"], 'passings': [3, 7, 2]}
myvar = pandas.DataFrame(mydataset)
print(myvar)

What is a DataFrame?
A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or

Example
Create a simple Pandas DataFrame:
a table with rows and columns.
import pandas as pd data = {"calories": [420, 380, 390],"duration": [50, 40, 45]}
#load data into a DataFrame object:
df = pd.DataFrame(data)
print(df)
Result

calories duration
0 420 50
1 380 40
2 390 45
Locate Row
As you can see from the result above, the DataFrame is like a table with

Example
Return row 0:
rows and columns. Pandas use the loc attribute to return one or more
specified row(s)
#refer to the row index:
print(df.loc[0])
Result
calories
420
duration 50
Name: 0, dtype: int64
Read CSV Files
A simple way to store big data sets is to use CSV files (comma separated files).
Example
Load the CSV into a DataFrame:
CSV files contains plain text and is a well know format that can be read by everyone
including Pandas. In our examples we will be using a CSV file called 'data.csv'.
https://www.w3schools.com/python/pandas/data.csv
import pandas as pd
df
=pd.read
_csv('da
ta.csv')
print(df.
to_strin
g())
Tip: use to_string() to print the entire DataFrame.

If you have a large DataFrame with many rows, Pandas will only return the first 5
Example
Print the DataFrame without the to_string() method:
rows, and the last 5 rows:
import pandas as pd
df =
pd.read_
csv('data
.csv')
print(df)

max_rows
The number of rows returned is defined in Pandas option settings.

Example
Check the number of maximum returned rows:
You can check your system's maximum rows with the pd.options.display.max_rows
statement.
import pandas as
pd
print(pd.options.
display.max_row
s)

In my system the number is 60, which means that if the DataFrame contains
more than 60 rows, the print(df) statement will return only the headers and the
first and last 5 rows.

Example
Increase the maximum number of rows to display the entire DataFrame:
You can change the maximum rows number with the same statement.
import pandas as
pd
pd.options.display
.max_rows = 9999
df =
pd.read_csv('data.
csv')
print(df)

Pandas - Analyzing DataFrames Viewing the Data

One of the most used method for getting a quick overview of the DataFrame, is the head()
method.

Example
Get a quick overview by printing the first 10 rows of the DataFrame:
The head() method returns the headers and a specified number of rows, starting from the
top.
import pandas as pd
df =
pd.read_
csv('data
.csv')
print(df.
head(10)
)
In our examples we will be using a CSV file called 'data.csv'.
https://www.w3schools.com/python/pandas/data.csv.
Note: if the number of rows is not specified, the head() method will return the top 5 rows.
Example
Print the first 5 rows of the DataFrame:

import pandas as pd
df =
pd.read_
csv('data
.csv')
print(df.
head())
There is also a tail() method for viewing the last rows of the DataFrame.

Example
Print the last 5 rows of the DataFrame:
The tail() method returns the headers and a specified number of rows, starting from the
bottom.
print(df.tail())

Info About the Data

The DataFrames object has a method called info(), that gives you more information about
the data set.
Example
Print information about the data:

print(df.info())
Result

<class
'pandas.core.frame.
DataFrame'>
RangeIndex: 169
entries, 0 to 168
Data columns (total
4 columns):
# Column Non-Null Count Dtype

0 Duration 169 non-null int64

1 Pulse 169 non-null int64
2 Maxpulse 169 non-null int64
3 Calories 164
non-null
None

Result Explained
The result tells us there are 169 rows and 4 columns:
RangeIndex: 169
entries, 0 to 168
Data columns
(total 4
columns):

And the name of each column, with the data type:

# Column Non-Null Count Dtype

0 Duration 169 non-null int64

1 Pulse 169 non-null int64
2 Maxpulse 169 non-null int64
3 Calories 164 non-null float64

Null Values
The info() method also tells us how many Non-Null values there are present in each
column, and in our data set it seems like there are 164 of 169 Non-Null values in the
"Calories" column.
Which means that there are 5 rows with no value at all, in the "Calories" column, for
whatever reason. Empty values, or Null values, can be bad when analyzing data, and
you should consider removing rows with empty values. This is a step towards what
is called cleaning data.
Pandas - Cleaning Data Data Cleaning
Data cleaning means fixing bad data in your data set.
Bad data could be:
● Empty cells
● Data in wrong format
● Wrong data
● Duplicates
In this tutorial you will learn how to deal with all of them.
Our Data Set
In the next chapters we will
use this data set: Duration
409.1
1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
5 60 '2020/12/06' 102 127 300.0
6 60 '2020/12/07' 110 136 374.0
7 450 '2020/12/08' 104 134 253.3
8 30 '2020/12/09' 109 133 195.1
9 60 '2020/12/10' 98 124 269.0
10 60 '2020/12/11' 103 147 329.3
11 60 '2020/12/12' 100 120 250.7
12 60 '2020/12/12' 100 120 250.7
13 60 '2020/12/13' 106 128 345.3
14 60 '2020/12/14' 104 132 379.3
15 60 '2020/12/15' 98 123 275.0
16 60 '2020/12/16' 98 120 215.2
17 60 '2020/12/17' 100 120 300.0
18 45 '2020/12/18' 90 112 NaN
19 60 '2020/12/19' 103 123 323.0
20 45 '2020/12/20' 97 125 243.0
21 60 '2020/12/21' 108 131 364.2
22 45 NaN 100 119 282.0
23 60 '2020/12/23' 130 101 300.0
24 45 '2020/12/24' 105 132 246.0
25 60 '2020/12/25' 102 126 334.5
26 60 2020/12/26 100 120 250.0
27 60 '2020/12/27' 92 118 241.0
28 60 '2020/12/28' 103 132 NaN
29 60 '2020/12/29' 100 132 280.0
30 60 '2020/12/30' 102 129 380.3
31 60 '2020/12/31' 92 115 243.0
The data set contains some empty cells ("Date" in row 22, and "Calories" in
row 18 and 28). The data set contains wrong format ("Date" in row 26).
The data set contains wrong data
("Duration" in row 7). The data set
contains duplicates (row 11 and 12).
Pandas -
Cleaning
Empty Cells
Empty Cells
Empty cells can potentially give you a wrong result when you analyze data.

Remove Rows
One way to deal with empty cells is to remove rows that contain empty cells.
This is usually OK, since data sets can be very big, and removing a few rows will
not have a big impact on the result.
Example
Return a new Data Frame with no empty cells:
import pandas as pd
df =
pd.read_
csv('data
.csv')
new_df
=
df.dropn
a()
print(ne
w_df.to
_string()
)
Note: By default, the dropna() method returns a new DataFrame, and will not
change the original. If you want to change the original DataFrame, use the
Example
inplace = True argument:
Remove all rows with NULL values:

import pandas as pd
df =
pd.read_
csv('data
.csv')
df.dropn
a(inplac
e=
True)
print(df.
to_strin
g())
Note: Now, the dropna(inplace = True) will NOT return a new DataFrame, but it will remove all rows
containing NULL values from the original DataFrame.

Replace Empty Values

Another way of dealing with empty cells is to insert a new value instead.This way you do
not have to delete entire rows just because of some empty cells. The fillna() method
Example
Replace NULL values with the number 130:
allows us to replace empty cells with a value:
import pandas as pd
df =
pd.read_csv('data.
csv')
df.fillna(130,
inplace = True)
Replace Only For Specified Columns
The example above replaces all empty cells in the whole Data Frame.

Example
Replace NULL values in the "Calories" columns with the number 130:
To only replace empty values for one column, specify the column name for the
DataFrame:
import pandas as pd
df =
pd.read_csv('data.csv')
df["Calories"].fillna(1
30, inplace = True)
Replace Using Mean,
Median, or Mode
A common way to replace empty cells, is to calculate the mean, median or mode value of
the column.
Pandas uses the mean() median() and mode() methods to calculate the respective values

Example
Calculate the MEAN, and replace any empty values with it:
for a specified column:
import pandas as pd
df =
pd.read_
csv('data
.csv') x
=
df["Calo
ries"].m
ean()
df["Calories"].fillna(x, inplace = True)
Mean = the average value (the sum of all values divided by number of values).
Example
Calculate the MEDIAN, and replace any empty values with it:
import pandas as pd
df =
pd.read_
csv('data
.csv') x =
df["Calo
ries"].me
dian()
df["Calories"].fillna(x, inplace = True)
Median = the value in the middle, after you have sorted all values ascending.
Example
Calculate the MODE, and replace any empty values with it:
import pandas as pd
df =
pd.read_c
sv('data.cs
v') x =
df["Calori
es"].mode
()[0]
df["Calories"].fillna(x,
inplace = True) Pandas -
Cleaning Data of Wrong
Format Data of Wrong
Format
Cells with data of wrong format can make it difficult, or even impossible, to analyze data.
To fix it, you have two options: remove the rows, or convert all cells in the columns into
the same format.

Convert Into a Correct Format

In our Data Frame, we have two cells with the wrong format. Check out row 22 and 26,
the 'Date' column should be a string that represents a date:
Duration Date Pulse Maxpulse Calories
0 60 '2020/12/ 11 1 409
01' 0 3 .1
0
1 60 '2020/12/ 11 1 479
02' 7 4 .0
5
2 60 '2020/12/ 10 1 340
03' 3 3 .0
5
3 45 '2020/12/ 10 1 282
04' 9 7 .4
5
4 45 '2020/12/ 11 1 406
05' 7 4 .0
8
5 60 '2020/12/06' 102 127 300.0
6 60 '2020/12/07' 110 136 374.0
7 450 '2020/12/08' 104 134 253.3
8 30 '2020/12/09' 109 133 195.1
9 60 '2020/12/10' 98 124 269.0
10 60 '2020/12/11' 103 147 329.3
11 60 '2020/12/12' 100 120 250.7
12 60 '2020/12/12' 100 120 250.7
13 60 '2020/12/13' 106 128 345.3
14 60 '2020/12/14' 104 132 379.3
15 60 '2020/12/15' 98 123 275.0
16 60 '2020/12/16' 98 120 215.2
17 60 '2020/12/17' 100 120 300.0
18 45 '2020/12/18' 90 112 NaN
19 60 '2020/12/19' 103 123 323.0
20 45 '2020/12/20' 97 125 243.0
21 60 '2020/12/21' 108 131 364.2
22 45 NaN 100 119 282.0
23 60 '2020/12/23' 130 101 300.0
24 45 '2020/12/24' 105 132 246.0
25 60 '2020/12/25' 102 126 334.5
26 60 20201226 100 120 250.0
27 60 '2020/12/27' 92 118 241.0
28 60 '2020/12/28' 103 132 NaN
29 60 '2020/12/29' 100 132 280.0
30 60 '2020/12/30' 102 129 380.3
31 60 '2020/12/31' 92 115 243.0
Let's try to convert all cells in the 'Date'
column into dates. Pandas has a
to_datetime() method for this:
Example
Convert to date:
import pandas as pd
df = pd.read_csv('data.csv')
df['Date'] =
pd.to_datetime(df['
Date'])
print(df.to_string())

Result:

Duration Date Pulse

Maxpulse Calories 0 60
'2020/12/01' 110 130
1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
5 60 '2020/12/06' 102 127 300.0
6 60 '2020/12/07' 110 136 374.0
7 450 '2020/12/08' 104 134 253.3
8 30 '2020/12/09' 109 133 195.1
9 60 '2020/12/10' 98 124 269.0
10 60 '2020/12/11' 103 147 329.3
11 60 '2020/12/12' 100 120 250.7
12 60 '2020/12/12' 100 120 250.7
13 60 '2020/12/13' 106 128 345.3
14 60 '2020/12/14' 104 132 379.3
15 60 '2020/12/15' 98 123 275.0
16 60 '2020/12/16' 98 120 215.2
17 60 '2020/12/17' 100 120 300.0
18 45 '2020/12/18' 90 112 NaN
19 60 '2020/12/19' 103 123 323.0
20 45 '2020/12/20' 97 125 243.0
21 60 '2020/12/21' 108 131 364.2
22 45 NaT 100 119 282.0
23 60 '2020/12/23' 130 101 300.0
24 45 '2020/12/24' 105 132 246.0
25 60 '2020/12/25' 102 126 334.5
26 60 '2020/12/26' 100 120 250.0
27 60 '2020/12/27' 92 118 241.0
28 60 '2020/12/28' 103 132 NaN
29 60 '2020/12/29' 100 132 280.0
30 60 '2020/12/30' 102 129 380.3
31 60 '2020/12/31' 92 115 243.0

As you can see from the result, the date in row 26 was fixed, but the empty date in row
22 got a NaT (Not a Time) value, in other words an empty value. One way to deal with
empty values is simply removing the entire row.

Removing Rows
The result from the converting in the example above gave us a NaT value, which can
be handled as a NULL value, and we can remove the row by using the dropna()
method.
Example
Remove rows with a NULL value in the "Date" column:

df.dropna(subset=['Dat
e'], inplace = True)
Pandas - Fixing Wrong
Data

Wrong Data
"Wrong data" does not have to be "empty cells" or "wrong format", it can just be wrong,
like if someone registered "199" instead of "1.99".
Sometimes you can spot wrong data by looking at the data set, because you have an
expectation of what it should be.
If you take a look at our data set, you can see that in row 7, the duration is 450, but for
all the other rows the duration is between 30 and 60.
It doesn't have to be wrong, but taking in consideration that this is the data set of
someone's workout sessions, we conclude with the fact that this person did not work
out in 450 minutes.
Duration Date Pulse
Maxpulse Calories 0 60
'2020/12/01' 110 130
1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
5 60 '2020/12/06' 102 127 300.0
6 60 '2020/12/07' 110 136 374.0
7 450 '2020/12/08' 104 134 253.3
8 30 '2020/12/09' 109 133 195.1
9 60 '2020/12/10' 98 124 269.0
10 60 '2020/12/11' 103 147 329.3
11 60 '2020/12/12' 100 120 250.7
12 60 '2020/12/12' 100 120 250.7
13 60 '2020/12/13' 106 128 345.3
14 60 '2020/12/14' 104 132 379.3
15 60 '2020/12/15' 98 123 275.0
16 60 '2020/12/16' 98 120 215.2
17 60 '2020/12/17' 100 120 300.0
18 45 '2020/12/18' 90 112 NaN
19 60 '2020/12/19' 103 123 323.0
20 45 '2020/12/20' 97 125 243.0
21 60 '2020/12/21' 108 131 364.2
22 45 NaN 100 119 282.0
23 60 '2020/12/23' 130 101 300.0
24 45 '2020/12/24' 105 132 246.0
25 60 '2020/12/25' 102 126 334.5
26 60 20201226 100 120 250.0
27 60 '2020/12/27' 92 118 241.0
28 60 '2020/12/28' 103 132 NaN
29 60 '2020/12/29' 100 132 280.0
30 60 '2020/12/30' 102 129 380.3
31 60 '2020/12/31' 92 115 243.0
How can we fix wrong values, like the one for "Duration" in row 7?

Replacing Values
One way to fix wrong values is to replace them with something else.
In our example, it is most likely a typo, and the value should be "45" instead of "450", and

Example
Set "Duration" = 45 in row 7:
we could just insert "45" in row 7:
df.loc[7, 'Duration'] = 45

For small data sets you might be able to replace the wrong data one by one, but not for
big data sets.
To replace wrong data for larger data sets you can create some rules, e.g. set some
boundaries for legal values, and replace any values that are outside of the boundaries.

for x in df.index:
if df.loc[x, "Duration"] > 120:
df.loc[x, "Duration"] = 120

Removing Rows
Another way of handling wrong data is to remove the rows that contain wrong data.
This way you do not have to find out what to replace them with, and there is a good
chance you do not need them to do your analyses.
Example
Delete rows where "Duration" is higher than 120:
for x in df.index:
if df.loc[x, "Duration"] > 120:
df.drop(x,
inplace =
True) Pandas
- Removing
Duplicates
Discovering
Duplicates
Duplicate rows are rows that have been registered more than one time.

Duration Date Pulse Maxpulse Calories

0 60 '2020/12/ 11 1 409
01' 0 3 .1
0
1 60 '2020/12/ 11 1 479
02' 7 4 .0
5
2 60 '2020/12/ 10 1 340
03' 3 3 .0
5
3 45 '2020/12/ 10 1 282
04' 9 7 .4
5
4 45 '2020/12/ 11 1 406
05' 7 4 .0
8
5 60 '2020/12/06' 102 127 300.0
6 60 '2020/12/07' 110 136 374.0
7 450 '2020/12/08' 104 134 253.3
8 30 '2020/12/09' 109 133 195.1
9 60 '2020/12/10' 98 124 269.0
10 60 '2020/12/11' 103 147 329.3
11 60 '2020/12/12' 100 120 250.7
12 60 '2020/12/12' 100 120 250.7
13 60 '2020/12/13' 106 128 345.3
14 60 '2020/12/14' 104 132 379.3
15 60 '2020/12/15' 98 123 275.0
16 60 '2020/12/16' 98 120 215.2
17 60 '2020/12/17' 100 120 300.0
18 45 '2020/12/18' 90 112 NaN
19 60 '2020/12/19' 103 123 323.0
20 45 '2020/12/20' 97 125 243.0
21 60 '2020/12/21' 108 131 364.2
22 45 NaN 100 119 282.0
23 60 '2020/12/23' 130 101 300.0
24 45 '2020/12/24' 105 132 246.0
25 60 '2020/12/25' 102 126 334.5
26 60 20201226 100 120 250.0
27 60 '2020/12/27' 92 118 241.0
28 60 '2020/12/28' 103 132 NaN
29 60 '2020/12/29' 100 132 280.0
30 60 '2020/12/30' 102 129 380.3
31 60 '2020/12/31' 92 115 243.0
By taking a look at our test data set, we can assume that row 11 and
12 are duplicates. To discover duplicates, we can use the
duplicated() method.

Example
Returns True for every row that is a duplicate, othwerwise False:
The duplicated() method returns a Boolean values for each row:
print(df.duplicated())

Removing Duplicates
To remove duplicates, use the drop_duplicates() method.
Example
Remove all duplicates:
df.drop_duplicat
es(inplace =
True) Pandas -
Data
Correlations

Finding Relationships
A great aspect of the Pandas module is the corr() method.
The corr() method calculates the relationship between each
column in your data set. The examples in this page uses a CSV
file called: 'data.csv'.
Download data.csv - https://www.w3schools.com/python/pandas/data.csv
Example
Show the relationship between the columns:
df.corr()
Result
Duration
Pulse Maxpulse Calories
Duration 1.000000 -0.155408 0.009403 0.922721
Pulse -0.155408 1.000000 0.786535 0.025120
Maxpulse 0.009403 0.786535 1.000000 0.203814
Calories 0.922721 0.025120 0.203814 1.000000
Note: The corr() method ignores "not numeric" columns.
Result Explained
The Result of the corr() method is a table with a lot of numbers that represents how
well the relationship is between two columns.
The number varies from -1 to 1.
1 means that there is a 1 to 1 relationship (a perfect correlation), and for this data set,
each time a value went up in the first column, the other one went up as well.
0.9 is also a good relationship, and if you increase one value, the other will probably
increase as well.
-0.9 would be just as good relationship as 0.9, but if you increase one value, the
other will probably go down.
0.2 means NOT a good relationship, meaning that if one value goes up does not mean that
What is a good correlation? It depends on the use, but I think it is safe to say you have to have at
least 0.6 (or -0.6) to call it a good correlation.
the other will.
Perfect Correlation:
We can see that "Duration" and "Duration" got the number 1.000000, which makes sense,
each column always has a perfect relationship with itself.
Good Correlation:
"Duration" and "Calories" got a 0.922721 correlation, which is a very good
correlation, and we can predict that the longer you work out, the more calories you
burn, and the other way around: if you burned a lot of calories, you probably had a
long work out.
Bad Correlation:
"Duration" and "Maxpulse" got a 0.009403 correlation, which is a very bad correlation,
meaning that we can not predict the max pulse by just looking at the duration of the
work out, and vice versa.
Pandas - Plotting
Plotting
Pandas uses the plot() method to create diagrams.
We can use Pyplot, a submodule of the Matplotlib library to visualize the diagram on the
screen.
Example
Import pyplot from Matplotlib and visualize our DataFrame:
import pandas as pd
import
matplotlib.p
yplot as plt
df =
pd.read_csv
('data.csv')
df.plot()
plt.show()
The examples in this page uses a CSV
file called: 'data.csv'.
https://www.w3schools.com/python/pan
das/data.csv Scatter Plot
Specify that you want a scatter plot with the kind argument:
kind = 'scatter'
A scatter plot needs an x- and a y-axis.
In the example below we will use "Duration" for the x-axis and
"Calories" for the y-axis. Include the x and y arguments like this:
x = 'Duration', y = 'Calories'
Example
import pandas as pd
import
matplotlib.p
yplot as plt
df =
pd.read_csv
('data.csv')
df.plot(kind = 'scatter', x =
'Duration', y = 'Calories')
plt.show()
Result

Remember: In the previous example, we learned that the correlation between "Duration" and "Calories"
was 0.922721, and we conluded with the fact that higher duration means more calories burned.
By looking at the scatterplot, I will agree.
Let's create another scatterplot, where there is a bad relationship between the
columns, like "Duration" and "Maxpulse", with the correlation 0.009403:
Example
A scatterplot where there are no relationship between the columns:
import pandas as pd
import
matplotlib.p
yplot as plt
df =
pd.read_csv
('data.csv')
df.plot(kind = 'scatter', x =
'Duration', y = 'Maxpulse')
plt.show()
Result

Histogram
Use the kind argument to specify that you want a histogram:
kind = 'hist'
A histogram needs only one column.
A histogram shows us the frequency of each interval, e.g. how many workouts lasted
between 50 and 60 minutes?
In the example below we will use the "Duration" column to create the histogram:
Example
df["Duration"].plot(kind = 'hist')

Pandas - DataFrame Reference

All properties and methods of the DataFrame object, with explanations and examples:
Property/ Description
Method
abs() Return a DataFrame with the absolute value of each value
add() Adds the values of a DataFrame with the specified value(s)
add_prefix() Prefix all labels
add_suffix() Suffix all labels
agg() Apply a function or a function name to one of the axis of the DataFrame
aggregate() Apply a function or a function name to one of the axis of the DataFrame
align() Aligns two DataFrames with a specified join method
all() Return True if all values in the DataFrame are True, otherwise False
any() Returns True if any of the values in the DataFrame are True, otherwise False
append() Append new columns
applymap() Execute a function for each element in the DataFrame
apply() Apply a function to one of the axis of the DataFrame
assign() Assign new columns
astype() Convert the DataFrame into a specified dtype
at Get or set the value of the item with the specified label
axes Returns the labels of the rows and the columns of the DataFrame
bfill() Replaces NULL values with the value from the next row
bool() Returns the Boolean value of the DataFrame
columns Returns the column labels of the DataFrame
combine() Compare the values in two DataFrames, and let a function decide which values
to keep
combine_first() Compare two DataFrames, and if the first DataFrame has a NULL value, it will
be filled with the respective value from the second DataFrame
compare() Compare two DataFrames and return the differences
convert_dtypes() Converts the columns in the DataFrame into new dtypes
corr() Find the correlation (relationship) between each column
count() Returns the number of not empty cells for each column/row
cov() Find the covariance of the columns
copy() Returns a copy of the DataFrame
cummax() Calculate the cumulative maximum values of the DataFrame
cummin() Calculate the cumulative minmum values of the DataFrame
cumprod() Calculate the cumulative product over the DataFrame
cumsum() Calculate the cumulative sum over the DataFrame
describe() Returns a description summary for each column in the DataFrame
diff() Calculate the difference between a value and the value of the same column in
the previous row
div() Divides the values of a DataFrame with the specified value(s)
dot() Multiplies the values of a DataFrame with values from another array-like object,
and add the result

drop() Drops the specified rows/columns from the DataFrame

drop_duplicates( Drops duplicate values from the DataFrame
)
droplevel() Drops the specified index/column(s)
dropna() Drops all rows that contains NULL values
dtypes Returns the dtypes of the columns of the DataFrame
duplicated() Returns True for duplicated rows, otherwise False
empty Returns True if the DataFrame is empty, otherwise False
eq() Returns True for values that are equal to the specified value(s), otherwise False
equals() Returns True if two DataFrames are equal, otherwise False
eval Evaluate a specified string
explode() Converts each element into a row
ffill() Replaces NULL values with the value from the previous row
fillna() Replaces NULL values with the specified value
filter() Filter the DataFrame according to the specified filter
first() Returns the first rows of a specified date selection
floordiv() Divides the values of a DataFrame with the specified value(s), and floor the values
ge() Returns True for values greater than, or equal to the specified value(s),
otherwise False
get() Returns the item of the specified key
groupby() Groups the rows/columns into specified groups
gt() Returns True for values greater than the specified value(s), otherwise False
head() Returns the header row and the first 10 rows, or the specified number of rows
iat Get or set the value of the item in the specified position
idxmax() Returns the label of the max value in the specified axis
idxmin() Returns the label of the min value in the specified axis
iloc Get or set the values of a group of elements in the specified positions
index Returns the row labels of the DataFrame
infer_objects() Change the dtype of the columns in the DataFrame
info() Prints information about the DataFrame
insert() Insert a column in the DataFrame
interpolate() Replaces not-a-number values with the interpolated method
isin() Returns True if each elements in the DataFrame is in the specified value
isna() Finds not-a-number values
isnull() Finds NULL values
items() Iterate over the columns of the DataFrame
iteritems() Iterate over the columns of the DataFrame
iterrows() Iterate over the rows of the DataFrame
itertuples() Iterate over the rows as named tuples
join() Join columns of another DataFrame
last() Returns the last rows of a specified date selection
le() Returns True for values less than, or equal to the specified value(s), otherwise False
loc Get or set the value of a group of elements specified using their labels
lt() Returns True for values less than the specified value(s), otherwise False
keys() Returns the keys of the info axis
kurtosis() Returns the kurtosis of the values in the specified axis
mask() Replace all values where the specified condition is True
max() Return the max of the values in the specified axis
mean() Return the mean of the values in the specified axis
median() Return the median of the values in the specified axis
melt() Reshape the DataFrame from a wide table to a long table
memory_usage() Returns the memory usage of each column
merge() Merge DataFrame objects
min() Returns the min of the values in the specified axis

mod() Modules (find the remainder) of the values of a DataFrame

mode() Returns the mode of the values in the specified axis
mul() Multiplies the values of a DataFrame with the specified value(s)
ndim Returns the number of dimensions of the DataFrame
ne() Returns True for values that are not equal to the specified value(s), otherwise False
nlargest() Sort the DataFrame by the specified columns, descending, and return the specified
number of rows
notna() Finds values that are not not-a-number
notnull() Finds values that are not NULL
nsmallest() Sort the DataFrame by the specified columns, ascending, and return the specified
number of rows
nunique() Returns the number of unique values in the specified axis
pct_change() Returns the percentage change between the previous and the current value
pipe() Apply a function to the DataFrame
pivot() Re-shape the DataFrame
pivot_table() Create a spreadsheet pivot table as a DataFrame
pop() Removes an element from the DataFrame
pow() Raise the values of one DataFrame to the values of another DataFrame
prod() Returns the product of all values in the specified axis
product() Returns the product of the values in the specified axis
quantile() Returns the values at the specified quantile of the specified axis
query() Query the DataFrame
radd() Reverse-adds the values of one DataFrame with the values of another DataFrame
rdiv() Reverse-divides the values of one DataFrame with the values of another DataFrame
reindex() Change the labels of the DataFrame
reindex_like() ??
rename() Change the labels of the axes
rename_axis() Change the name of the axis
reorder_levels() Re-order the index levels
replace() Replace the specified values
reset_index() Reset the index
rfloordiv() Reverse-divides the values of one DataFrame with the values of another DataFrame
rmod() Reverse-modules the values of one DataFrame to the values of another DataFrame
rmul() Reverse-multiplies the values of one DataFrame with the values of another
DataFrame
round() Returns a DataFrame with all values rounded into the specified format
rpow() Reverse-raises the values of one DataFrame up to the values of another DataFrame
rsub() Reverse-subtracts the values of one DataFrame to the values of another DataFrame
rtruediv() Reverse-divides the values of one DataFrame with the values of another DataFrame
sample() Returns a random selection elements
sem() Returns the standard error of the mean in the specified axis
select_dtypes() Returns a DataFrame with columns of selected data types
shape Returns the number of rows and columns of the DataFrame
set_axis() Sets the index of the specified axis
set_flags() Returns a new DataFrame with the specified flags
set_index() Set the Index of the DataFrame
size Returns the number of elements in the DataFrame
skew() Returns the skew of the values in the specified axis
sort_index() Sorts the DataFrame according to the labels
sort_values() Sorts the DataFrame according to the values
squeeze() Converts a single column DataFrame into a Series
stack() Reshape the DataFrame from a wide table to a long table
std() Returns the standard deviation of the values in the specified axis
sum() Returns the sum of the values in the specified axis

sub() Subtracts the values of a DataFrame with the specified value(s)

swaplevel() Swaps the two specified levels
T Turns rows into columns and columns into rows
tail() Returns the headers and the last rows
take() Returns the specified elements
to_xarray() Returns an xarray object
transform() Execute a function for each value in the DataFrame
transpose() Turns rows into columns and columns into rows
truediv() Divides the values of a DataFrame with the specified value(s)
truncate() Removes elements outside of a specified set of values
update() Update one DataFrame with the values from another DataFrame
value_counts() Returns the number of unique rows
values Returns the DataFrame as a NumPy array
var() Returns the variance of the values in the specified axis
where() Replace all values where the specified condition is False
xs() Returns the cross-section of the DataFrame
iter () Returns an iterator of the info axes
Sample Programs
Write a Pandas program to get the powers of an array values element-wise.
Note: First array elements raised to powers from second array
Sample data: {'X':[78,85,96,80,86], 'Y':[84,94,89,83,86],'Z':[86,97,96,72,83]}
Python Code :
import pandas as pd
df = pd.DataFrame({'X':[78,85,96,80,86], 'Y':[84,94,89,83,86],'Z':[86,97,96,72,83]});
print(df)

Write a Pandas program to calculate the sum of the examination attempts by the students.
Sample DataFrame:
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew',
'Laura', 'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
Python Code :
import pandas as pd import numpy as np
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew',
'Laura', 'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df = pd.DataFrame(exam_data , index=labels) print("\nSum of the examination attempts by
the students:") print(df['attempts'].sum())

Write a Pandas program to iterate over rows in a DataFrame.

Sample Python dictionary data and list labels:
exam_data = [{'name':'Anastasia', 'score':12.5}, {'name':'Dima','score':9},
{'name':'Katherine','score':16.5}]
Python Code :
import pandas as pd
import numpy as np
exam_data = [{'name':'Anastasia', 'score':12.5}, {'name':'Dima','score':9},
{'name':'Katherine','score':16.5}] df = pd.DataFrame(exam_data) for index, row in
df.iterrows():
print(row['name'], row['score'])

Write a Pandas program to drop a list of rows from a specified DataFrame.

Sample data:
Original DataFrame col1 col2 col3
0147
1458
2369
3470
4581
New DataFrame after removing 2nd & 4th rows:
col1 col2 col3 0 1 4 7
1458
3470
Python Code :
import pandas as pd import numpy as np
d = {'col1': [1, 4, 3, 4, 5], 'col2': [4, 5, 6, 7, 8], 'col3': [7, 8, 9, 0, 1]}
df = pd.DataFrame(d) print("Original DataFrame") print(df)
print("New DataFrame after removing 2nd & 4th rows:") df = df.drop(df.index[[2,4]])
print(df)

Write a Pandas program to select a specific row of given series/dataframe by integer index.
Test Data:
0 s001 V Alberto 15/05/2002 35 street1 t1
Franco
1 s002 V Gino Mcneill 17/05/2002 32 street2 t2
2 s003 V Ryan Parkes 16/02/1999 33 street3 t3
I
3 s001 V Eesha Hinton 25/09/1998 30 street1 t4
I
4 s002 V Gino Mcneill 11/05/2002 31 street2 t5
5 s004 V David Parkes 15/09/1997 32 street4 t6
I
Python Code :
import pandas as pd
ds = pd.Series([1,3,5,7,9,11,13,15], index=[0,1,2,3,4,5,7,8])
print("Original Series:") print(ds)
print("\nPrint specified row from the said series using location based indexing:") print("\
nThird row:")
print(ds.iloc[[2]]) print("\nFifth row:") print(ds.iloc[[4]])
df = pd.DataFrame({
'school_code': ['s001','s002','s003','s001','s002','s004'],
'class': ['V', 'V', 'VI', 'VI', 'V', 'VI'],
'name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Gino Mcneill', 'David
Parkes'], 'date_of_birth':
['15/05/2002','17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'], 'weight': [35,
32, 33, 30, 31, 32]})
print("Original DataFrame with single index:") print(df)
print("\nPrint specified row from the said DataFrame using location based indexing:") print("\
nThird row:")
print(df.iloc[[2]]) print("\nFifth row:") print(df.iloc[[4]])

Write a Pandas program to add, subtract, multiple and divide two Pandas Series.
Sample Series: [2, 4, 6, 8, 10], [1, 3, 5, 7, 9]
Python Code :
import pandas as pd
ds1 = pd.Series([2, 4, 6, 8, 10])
ds2 = pd.Series([1, 3, 5, 7, 9]) ds = ds1 + ds2
print("Add two Series:") print(ds)
print("Subtract two Series:") ds = ds1 - ds2
print(ds)
print("Multiply two Series:") ds = ds1 * ds2
print(ds)
print("Divide Series1 by Series2:") ds = ds1 / ds2
print(ds)

Write a Pandas program to compare the elements of the two Pandas Series.
Sample Series: [2, 4, 6, 8, 10], [1, 3, 5, 7, 10]
Python Code :
import pandas as pd
ds1 = pd.Series([2, 4, 6, 8, 10])
ds2 = pd.Series([1, 3, 5, 7, 10]) print("Series1:")
print(ds1) print("Series2:") print(ds2)
print("Compare the elements of the said Series:") print("Equals:")
print(ds1 == ds2) print("Greater than:") print(ds1 > ds2) print("Less than:") print(ds1 < ds2)

Create the Excel file with name coalpublic2013.xlsx' and write the program for the following
Ye MSHA Mine_Name Productio Labor_Hou
ar ID n rs
201 103381 Tacoa Highwall Miner 56,004 22,392
3
201 103404 Reid School Mine 28,807 8,447
3
201 100759 North River #1 Underground 14,40,115 4,74,784
3 Min
201 103246 Bear Creek 87,587 29,193
3
201 103451 Knight Mine 1,47,499 46,393
3
201 103433 Crane Central Mine 69,339 47,195
3
201 100329 Concord Mine 0 1,44,002
3
201 100851 Oak Grove Mine 22,69,014 10,01,809
3
201 102901 Shoal Creek Mine 0 12,396
3
201 102901 Shoal Creek Mine 14,53,024 12,37,415
3
201 103180 Sloan Mountain Mine 3,27,780 1,96,963
3
201 103182 Fishtrap 1,75,058 87,314
3
201 103285 Narley Mine 1,54,861 90,584
3
201 103332 Powhatan Mine 1,40,521 61,394
3
201 103375 Johnson Mine 580 1,900
3
201 103419 Maxine-Pratt Mine 1,25,824 1,07,469
3
201 103432 Skelton Creek 8,252 220
3
201 103437 Black Warrior Mine No 11,45,924 70,926
3
201 102976 Piney Woods Preparation Plant 0 14,828
3
201 102976 Piney Woods Preparation Plant 0 23,193
3
201 103380 Calera 0 12,621
3
201 103380 Calera 0 1,402
3
201 103422 Clark No 1 Mine 1,22,727 1,40,250
3
201 103323 Deerlick Mine 1,33,452 46,381
3
201 103364 Brc Alabama No. 7 Llc 0 14,324
3
201 103436 Swann's Crossing 1,37,511 77,190
3
201 100347 Choctaw Mine 5,37,429 2,15,295
3
201 101362 Manchester Mine 2,19,457 1,16,914
3
201 102996 Jap Creek Mine 3,75,715 1,64,093
3
201 103370 Cresent Valley Mine 2,860 621
3

Write a Pandas program to import given excel data into a Pandas dataframe. Python
Code :
import pandas as pd import numpy as np
df = pd.read_excel('E:\coalpublic2013.xlsx') print(df.head)

Write a Pandas program to import some excel data skipping first twenty rows into a Pandas
dataframe.
Python Code :
import pandas as pd import numpy as np
df = pd.read_excel('E:\coalpublic2013.xlsx', skiprows = 20) df

Write a Pandas program to find the sum, mean, max, min value of 'Production
(short tons)' column of excel file.
Python Code :
import pandas as pd import numpy as np
df = pd.read_excel('E:\coalpublic2013.xlsx')print("Sum: ",df["Production"].sum())
print("Mean: ",df["Production"].mean())
print("Maximum: ",df["Production"].max())
print("Minimum: ",df["Production"].min())
Write a Pandas program to insert a column in the sixth position of the said
excel sheet and fill it with NaN values.
Python Code :
import pandas as pd
import numpy as np
df = pd.read_excel('E:\coalpublic2013.xlsx') df.insert(3, "column1", np.nan) print(df.head)

Write a Pandas program to import excel data into a Pandas dataframe

and display the last ten rows.
Python Code :
import pandas as pd import numpy as np
df = pd.read_excel('E:\coalpublic2013.xlsx') df.tail(n=10)

Write a Pandas program to import given excel data into a dataframe and
find all records that include two specific MSHA ID.
Python Code :
import pandas as pd
import numpy as np
df = pd.read_excel('E:\coalpublic2013.xlsx') df[df["MSHA
ID"].isin([102976,103380])].head()

Write a Pandas program to create a subtotal of "Labor Hours" against

MSHA ID from the given excel data
Sample Solution:
Python Code :
import pandas as pd import numpy as np
df = pd.read_excel('E:\coalpublic2013.xlsx')
df_sub=df[["MSHA ID","Labor_Hours"]].groupby('MSHA ID').sum() df_sub

Write a Pandas program to import given excel data (coalpublic2013.xlsx ) into a

dataframe and draw a bar plot comparing year, MSHA ID, Production and
Labor_hours of first ten records.
Python Code :

import pandas as pd import numpy as np

import matplotlib.pyplot as plt
df = pd.read_excel('E:\coalpublic2013.xlsx') df.head(10).plot(kind='bar', figsize=(20,8))
plt.show()

Result:

Thus study and work with Pandas data frames are discussed and completed
successfully.

12 Information Practices Text Book Preeti Arora
No ratings yet
12 Information Practices Text Book Preeti Arora
45 pages
Practice Questions for Tableau Desktop Specialist Certification Case Based
From Everand
Practice Questions for Tableau Desktop Specialist Certification Case Based
Exam OG
5/5 (1)
Getting Started with SAS Programming: Using SAS Studio in the Cloud
From Everand
Getting Started with SAS Programming: Using SAS Studio in the Cloud
Ron Cody
No ratings yet
Pandas Module (Part-I)
No ratings yet
Pandas Module (Part-I)
36 pages
Pandas
No ratings yet
Pandas
21 pages
Pandas
No ratings yet
Pandas
30 pages
Pandas Notes (1)
No ratings yet
Pandas Notes (1)
10 pages
asfasdas
No ratings yet
asfasdas
36 pages
CHP 8 Pandas
No ratings yet
CHP 8 Pandas
49 pages
Exercise 3
No ratings yet
Exercise 3
12 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
Pandas-1
No ratings yet
Pandas-1
50 pages
MOD-3 Dap
No ratings yet
MOD-3 Dap
41 pages
Pandas
No ratings yet
Pandas
41 pages
Lecture 7 Understanding dataFrames in Python and R
No ratings yet
Lecture 7 Understanding dataFrames in Python and R
17 pages
introduction to pandas
No ratings yet
introduction to pandas
14 pages
Pandas in Python
No ratings yet
Pandas in Python
59 pages
Pandas
No ratings yet
Pandas
29 pages
2_Pandas
No ratings yet
2_Pandas
22 pages
Pandas - Cleaning Empty Cells
No ratings yet
Pandas - Cleaning Empty Cells
1 page
Data Science - Sec4
No ratings yet
Data Science - Sec4
16 pages
CH-6 Data Loading, Storage, and File Formats
No ratings yet
CH-6 Data Loading, Storage, and File Formats
163 pages
Data Science - Sec3
No ratings yet
Data Science - Sec3
27 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Importing Files Through Pandas
No ratings yet
Importing Files Through Pandas
16 pages
Cheat Sheet
No ratings yet
Cheat Sheet
10 pages
Pandas cheat sheet
No ratings yet
Pandas cheat sheet
19 pages
Pandas Cheat Sheet Final
No ratings yet
Pandas Cheat Sheet Final
1 page
Pandas
No ratings yet
Pandas
94 pages
Data Science Unit 2 Second Half Notes[1]
No ratings yet
Data Science Unit 2 Second Half Notes[1]
18 pages
PPT for Assignment-3 (Final_Pandas_Lab)
No ratings yet
PPT for Assignment-3 (Final_Pandas_Lab)
40 pages
exp3 python (1)
No ratings yet
exp3 python (1)
15 pages
ANL252 SU4 Jul2022
No ratings yet
ANL252 SU4 Jul2022
55 pages
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
No ratings yet
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
8 pages
pandas_merged
No ratings yet
pandas_merged
2 pages
Mdad - Numpy ML
No ratings yet
Mdad - Numpy ML
85 pages
Practice 1
No ratings yet
Practice 1
45 pages
Pandas 1
No ratings yet
Pandas 1
2 pages
Exp1 - Manipulating Datasets Using Pandas
No ratings yet
Exp1 - Manipulating Datasets Using Pandas
15 pages
Pandas DataFrameObject
No ratings yet
Pandas DataFrameObject
4 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
10 pages
Pandas Notes
No ratings yet
Pandas Notes
5 pages
Lab 9
No ratings yet
Lab 9
9 pages
Cheat Sheet - Pandas
No ratings yet
Cheat Sheet - Pandas
12 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
L6
No ratings yet
L6
67 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
No ratings yet
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
12 pages
DS (Pandas)
No ratings yet
DS (Pandas)
17 pages
Data Cleaning With Python and Pandas
No ratings yet
Data Cleaning With Python and Pandas
49 pages
lab 1 ML lab
No ratings yet
lab 1 ML lab
15 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
10 pages
1745516832930-Pandas-Handbook
No ratings yet
1745516832930-Pandas-Handbook
33 pages
justenoughpython_pandas_220915_175329
No ratings yet
justenoughpython_pandas_220915_175329
64 pages
Pandas Basics
No ratings yet
Pandas Basics
21 pages
12 Pandas
100% (1)
12 Pandas
21 pages
Class X11 Dataframe Notes PDF
No ratings yet
Class X11 Dataframe Notes PDF
17 pages
5CS037 WS02 PandasForDataAnalysis
No ratings yet
5CS037 WS02 PandasForDataAnalysis
30 pages
SAS For Dummies
From Everand
SAS For Dummies
Chris Hemedinger
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Question Bank (1)
No ratings yet
Question Bank (1)
18 pages
Dbms Seminar
No ratings yet
Dbms Seminar
24 pages
Cs3491 - Aiml Lab Record
No ratings yet
Cs3491 - Aiml Lab Record
26 pages
AIML Manual _merged
No ratings yet
AIML Manual _merged
41 pages
CS2221 PROGRAMMING IN C Syllabus
No ratings yet
CS2221 PROGRAMMING IN C Syllabus
2 pages
Ex 6
No ratings yet
Ex 6
7 pages
Notes 1
No ratings yet
Notes 1
97 pages
Sofiplus 2020: Interpolation of Cross Sections
No ratings yet
Sofiplus 2020: Interpolation of Cross Sections
2 pages
CADA QP-Ktunotes - in PDF
No ratings yet
CADA QP-Ktunotes - in PDF
3 pages
Question Bank M Tech 2ND Sem Batch 2018
No ratings yet
Question Bank M Tech 2ND Sem Batch 2018
31 pages
國立交通大學單位中英文名稱對照表 - 國立交通大學秘書室
No ratings yet
國立交通大學單位中英文名稱對照表 - 國立交通大學秘書室
7 pages
Nicholson 1965
No ratings yet
Nicholson 1965
5 pages
Retrieval Models: Boolean and Vector Space
No ratings yet
Retrieval Models: Boolean and Vector Space
41 pages
Manual 4 Color Screen Printing Press PDF
100% (1)
Manual 4 Color Screen Printing Press PDF
2 pages
PHD Synopsis - Nilesh Pancholi - 129990919010 - 339919
No ratings yet
PHD Synopsis - Nilesh Pancholi - 129990919010 - 339919
22 pages
MTC Math L.6 Mid
No ratings yet
MTC Math L.6 Mid
12 pages
F1 Chapter 2 - Numerical Solutions (Incomplete)
No ratings yet
F1 Chapter 2 - Numerical Solutions (Incomplete)
3 pages
Chapter - 3 Notes - Measures of Central Tendency
No ratings yet
Chapter - 3 Notes - Measures of Central Tendency
13 pages
Siemens PID
No ratings yet
Siemens PID
49 pages
To Read Carefully: Xavier Jouve, PHD, Fcs
No ratings yet
To Read Carefully: Xavier Jouve, PHD, Fcs
8 pages
Python ppt
No ratings yet
Python ppt
124 pages
Test Type Exam Book Invisible by Eloy Moreno
100% (1)
Test Type Exam Book Invisible by Eloy Moreno
4 pages
Intermediate Excel For Real Estate
100% (3)
Intermediate Excel For Real Estate
109 pages
Math Intervention Plan Primary School level
No ratings yet
Math Intervention Plan Primary School level
15 pages
Chapter 2 PDF
No ratings yet
Chapter 2 PDF
21 pages
CE A433 CE A433 - RC Design RC Design T. Bart Quimby, P.E., Ph.D. T. Bart Quimby, P.E., PH.D
No ratings yet
CE A433 CE A433 - RC Design RC Design T. Bart Quimby, P.E., Ph.D. T. Bart Quimby, P.E., PH.D
3 pages
The Law of Sines
No ratings yet
The Law of Sines
32 pages
mth101 PDF
100% (1)
mth101 PDF
165 pages
Summary of Kohlberg (1971) From Is To Ought: How To Commit The Naturalistic Fallacy and Get Away With It in The Study of Moral Development
No ratings yet
Summary of Kohlberg (1971) From Is To Ought: How To Commit The Naturalistic Fallacy and Get Away With It in The Study of Moral Development
25 pages
Introduction To Biomedical Engineering (Note 1)
No ratings yet
Introduction To Biomedical Engineering (Note 1)
21 pages
The Role of Redundancy and Overstrength in Earthquake Resistant Design
No ratings yet
The Role of Redundancy and Overstrength in Earthquake Resistant Design
8 pages
Basic Maths_Class 11th JEE_Questions
No ratings yet
Basic Maths_Class 11th JEE_Questions
29 pages
Delta Ia-Plc Pid An en 20141222
No ratings yet
Delta Ia-Plc Pid An en 20141222
26 pages
CARL J. WENNING - Scientific Epistemology - How Scientists Know What They Know (JPTEO - Autumn 2009)
No ratings yet
CARL J. WENNING - Scientific Epistemology - How Scientists Know What They Know (JPTEO - Autumn 2009)
13 pages
Fuzzy Geographically Weighted Clustering
No ratings yet
Fuzzy Geographically Weighted Clustering
8 pages
Flat It Gate 2
No ratings yet
Flat It Gate 2
33 pages
Pengaruh Kompetensi Terhadap Produktivitas Kerja Pegawai Kantor Unit Penyelenggara Pelabuhan Kelas Iii Satui
No ratings yet
Pengaruh Kompetensi Terhadap Produktivitas Kerja Pegawai Kantor Unit Penyelenggara Pelabuhan Kelas Iii Satui
14 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Exercise 3

Uploaded by

Exercise 3

Uploaded by

Ex. No. : 3.

Working with Pandas data frames

Pandas - Analyzing DataFrames Viewing the Data

Info About the Data

0 Duration 169 non-null int64

And the name of each column, with the data type:

0 Duration 169 non-null int64

Replace Empty Values

Convert Into a Correct Format

Duration Date Pulse

Duration Date Pulse Maxpulse Calories

Pandas - DataFrame Reference

drop() Drops the specified rows/columns from the DataFrame

mod() Modules (find the remainder) of the values of a DataFrame

sub() Subtracts the values of a DataFrame with the specified value(s)

Write a Pandas program to iterate over rows in a DataFrame.

Write a Pandas program to drop a list of rows from a specified DataFrame.

Write a Pandas program to import excel data into a Pandas dataframe

Write a Pandas program to create a subtotal of "Labor Hours" against

Write a Pandas program to import given excel data (coalpublic2013.xlsx ) into a

import pandas as pd import numpy as np

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.