Pandas

Python - Pandas
# What is Pandas?
Pandas is a Python library used for working with data sets.
It has functions for analyzing, cleaning, exploring, and manipulating data.
The name "Pandas" has a reference to both "Panel Data", and "Python Data
Analysis" and was created by Wes McKinney in 2008.
Why Use Pandas?
Pandas allows us to analyze big data and make conclusions based on

statistical theories.
Pandas can clean messy data sets, and make them readable and relevant.
Relevant data is very important in data science.
What Can Pandas Do?

Pandas gives you answers about the data. Like:
Is there a correlation between two or more columns?
What is average value?

Max value?
Min value?
Pandas are also able to delete rows that are not relevant, or contains wrong values, like
empty or NULL values. This is called cleaning the data.
Where is the Pandas Codebase?
The source code for Pandas is located at this github repository

https://github.com/pandas-dev/pandas
1
1. Pandas Getting Started
1.1 Installation of Pandas
If you have Python and PIP already installed on a system, then installation
of Pandas is very easy.
Install it using this command:
C:\Users\Your Name>pip install pandas
1.2 Import Pandas
Once Pandas is installed, import it in your applications by adding the

"import" keyword:
Syntax : import pandas

Now Pandas is imported and ready to use.
import pandas
mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2]
}
myvar = pandas.DataFrame(mydataset)
print(myvar)
OUTPUT :
cars passings
0 BMW 3
1 Volvo 7
2 Ford 2
1.3 Pandas as pd
Pandas is usually imported under the pd alias.
2
alias: In Python alias are an alternate name for referring to the same thing.
Create an alias with the "as" keyword while importing:
### Syntax : import pandas as pd

Now the Pandas package can be referred to as pd instead of pandas.
import pandas as pd
mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2]
}
myvar = pd.DataFrame(mydataset)
print(myvar)
OUTPUT :
cars passings
0 BMW 3
1 Volvo 7
2 Ford 2
1.4 Checking Pandas Version

The version string is stored under __version__ attribute.
import pandas as pd
print(pd.__version__)
OUTPUT :
1.2.2
2. Pandas Series
2.1 What is a Series?
A Pandas Series is like a column in a table.
It is a one-dimensional array holding data of any type.
3
Example 2.1 : Create a simple Pandas Series from a list - int, float, string
import pandas as pd
a = [1, 2, 3]
myvar = pd.Series(a)
print(myvar)
OUTPUT :
0 1
1 2
2 3
dtype: int64
The datatype of the elements in the Series is int64.
Based on the values present in the series, the datatype of the series is
decided.
import pandas as pd
a = [1.1, 2.2, 3.3]
print(myvar)
OUTPUT :
0 1.1
1 2.2
2 3.3
dtype: float64
import pandas as pd
a = ["apple", "banana", "orange"]
print(myvar)
OUTPUT :
0 apple
1 banana
2 orange
dtype: object
4
import pandas as pd
a = [1, "banana", 3]
print(myvar)
OUTPUT :
0 1
1 banana
2 3
dtype: object
2.2 Labels
If nothing else is specified, the values are labeled with their index number.
First value has index 0, second value has index 1 etc.
This label can be used to access a specified value.
Example 2.2 : Return the second value of the Series:
import pandas as pd
a = [1, 2, 3, 4, 5, 6, 7]
print(myvar[1])
OUTPUT :
2
import pandas as pd
a = [1, 2, 3, 4, 5, 6, 7]
print(myvar[1:4])
OUTPUT :
1 2
2 3
3 4
dtype: int64
5
2.3 Create Labels
With the "index argument", you can name your own labels.
Example 2.3 : Create you own labels
import pandas as pd
a = [1, 2, 3]
myvar = pd.Series(a, index = ["x", "y", "z"])
print(myvar)
OUTPUT :
x 1
y 2
z 3
dtype: int64
When you have created labels, you can access an item by referring to the label.
Example 2.4 : Return the value of "y":
import pandas as pd
a = [1, 2, 3]
myvar = pd.Series(a, index = ["x", "y", "z"])
print(myvar["y"])
OUTPUT :
2
3. Pandas DataFrames
3.1What is a DataFrame?
A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional

array, or a table with rows and columns.
In Python Pandas module, DataFrame is a very basic and important type.
6
To create a DataFrame from different sources of data or other Python
datatypes, we can use "DataFrame()" constructor.
Syntax of DataFrame() class :
DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)
Example 3.1 Create an Empty DataFrame
To create an empty DataFrame, pass no arguments to pandas.DataFrame() class.
In this example, we create an empty DataFrame and print it to the console

output.
import pandas as pd
df = pd.DataFrame()
print(df)
OUTPUT :
Empty DataFrame
Columns: []
Index: []
Example 3.2 : Create a simple Pandas DataFrame
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
#load data into a DataFrame object:
df = pd.DataFrame(data)
print(df)
OUTPUT :
calories duration
0 420 50
1 380 40
2 390 45
7
Example 3.3 Create a simple Pandas DataFrame with Lables - Index
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
df = pd.DataFrame(data, index = ["day1", "day2", "day3"])
print(df)
OUTPUT :
calories duration
day1 420 50
day2 380 40
day3 390 45
3.2 Create Pandas DataFrame from List of Lists?
To create Pandas DataFrame from list of lists, you can pass this list of
lists as data argument to "pandas.DataFrame()".
Each inner list inside the outer list is transformed to a row in resulting
DataFrame.
Example 3.4: Create DataFrame from List of Lists

import pandas as pd
#list of lists
data = [['a1', 'b1', 'c1'],

['a2', 'b2', 'c2'],
['a3', 'b3', 'c3']]
print(df)
OUTPUT :
0 1 2
0 a1 b1 c1
1 a2 b2 c2
2 a3 b3 c3
8
Example 3.5: Create DataFrame from List of Lists with Column Names & Index
import pandas as pd
#list of lists
data = [['a1', 'b1', 'c1'],

['a2', 'b2', 'c2'],
['a3', 'b3', 'c3']]
columns = ['C1', 'C2', 'C3']

index = ['R1', 'R2', 'R3']
df = pd.DataFrame(data, index, columns)

print(df)
OUTPUT :
C1 C2 C3
R1 a1 b1 c1
R2 a2 b2 c2
R3 a3 b3 c3
Example 3.5: Create DataFrame from List of Lists with Different List Lengths
import pandas as pd
#list of lists
data = [['a1', 'b1', 'c1', 'd1'],
['a2', 'b2', 'c2'],
['a3', 'b3', 'c3']]
print(df)
OUTPUT :
0 1 2 3
0 a1 b1 c1 d1
1 a2 b2 c2 None
2 a3 b3 c3 None
3.3 Create Pandas DataFrame from Python Dictionary
You can create a DataFrame from Dictionary by passing a dictionary as the

data argument to DataFrame() class.
9
Example 3.6: Create DataFrame from Dictionary
import pandas as pd
mydictionary = {'names': ['raju', 'ramu', 'ravi', 'akash'],

'physics': [68, 74, 77, 78],
'chemistry': [84, 56, 73, 69],
'algebra': [78, 88, 82, 87]}
#create dataframe using dictionary
df_marks = pd.DataFrame(mydictionary)
print(df_marks)
OUTPUT :
names physics chemistry algebra
0 raju 68 84 78
1 ramu 74 56 88
2 ravi 77 73 82
3 akash 78 69 87
Shape or Dimensions of Pandas DataFrame

To get the shape of Pandas DataFrame, use "DataFrame.shape".
The shape property returns a tuple representing the dimensionality of the
DataFrame.
The format of shape would be (rows, columns).
Example: DataFrame Shape

In the following example, we will find the shape of DataFrame.
Also, you can get the number of rows or number of columns using index on the
shape.
import pandas as pd
data = [['a1', 'b1', 'c1'],

['a2', 'b2', 'c2'],
['a3', 'b3', 'c3'],
['a4', 'b4', 'c4']]
columns = ['C1', 'C2', 'C3']

index = ['R1', 'R2', 'R3', 'R4']
print('The DataFrame is :\n', df)
#get dataframe shape
shape = df.shape
print('\nDataFrame Shape :', shape)
print('\nNumber of rows :', shape[0])
print('\nNumber of columns :', shape[1])
10
OUTPUT :
The DataFrame is :
C1 C2 C3
R1 a1 b1 c1
R2 a2 b2 c2
R3 a3 b3 c3
R4 a4 b4 c4
DataFrame Shape : (4, 3)
Number of rows : 4
Number of columns : 3
Print Information of Pandas DataFrame
To print information of Pandas DataFrame, call DataFrame.info() method.
The DataFrame.info() method returns nothing but just prints information about
this DataFrame.
Example : Print DataFrame Information
In the following program, we have created a DataFrame.
We shall print this DataFrame’s information using DataFrame.info() method.
import pandas as pd
df = pd.DataFrame(
[['abc', 22],
['xyz', 25],
['pqr', 31]],
columns=['name', 'age'])
print(df)
df.info()
OUTPUT :
name age
0 abc 22
1 xyz 25
2 pqr 31
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
11
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 name 3 non-null object
1 age 3 non-null int64
dtypes: int64(1), object(1)
memory usage: 176.0+ bytes
import pandas as pd
data = [['a1', 'b1', 'c1'],

['a2', 'b2', 'c2'],
['a3', 'b3', 'c3'],
['a4', 'b4', 'c4']]
columns = ['C1', 'C2', 'C3']

index = ['R1', 'R2', 'R3', 'R4']

print(df)
df.info()
OUTPUT :
C1 C2 C3
R1 a1 b1 c1
R2 a2 b2 c2
R3 a3 b3 c3
R4 a4 b4 c4
Index: 4 entries, R1 to R4
--- ------ -------------- -----
0 C1 4 non-null object
dtypes: object(3)
12
import pandas as pd
mydictionary = {'names': ['raju', 'ramu', 'ravi', 'akash'],

'physics': [68, 74, 77, 78],
'chemistry': [84, 56, 73, 69],
'algebra': [78, 88, 82, 87]}
#create dataframe using dictionary

df_marks = pd.DataFrame(mydictionary)
print(df_marks)
df_marks.info()
OUTPUT :
names physics chemistry algebra
0 raju 68 84 78
1 ramu 74 56 88
2 ravi 77 73 82
3 akash 78 69 87
RangeIndex: 4 entries, 0 to 3
--- ------ -------------- -----
0 names 4 non-null object
1 physics 4 non-null int64
2 chemistry 4 non-null int64
3 algebra 4 non-null int64
dtypes: int64(3), object(1)
Pandas Read CSV

A simple way to store big data sets is to use CSV files (comma separated
files).
CSV files contains plain text and is a well know format that can be read by
everyone including Pandas.
import pandas as pd
#load dataframe from csv

df = pd.read_csv("pandas.csv")
#print dataframe
print(df)
13
OUTPUT :
Name maths physics chemisry
0 a 11 21 31
1 b 12 22 32
2 c 13 23 32
3 d 14 24 34
Note : If you have a large DataFrame with many rows, Pandas will only return the first 5 rows,
and the last 5 rows:
import pandas as pd

df = pd.read_csv("pandas1.csv")
#print dataframe
print(df)
OUTPUT :
Duration Pulse Maxpulse Calories
0 60 110 130 409.1
1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0
.. ... ... ... ...
164 60 105 140 290.8
165 60 110 145 300.0
166 60 115 145 310.2
167 75 120 150 320.4
168 75 125 150 330.4
[169 rows x 4 columns]
Tip: use to_string() to print the entire DataFrame.
import pandas as pd

df = pd.read_csv("pandas1.csv")
#print dataframe
print(df.to_string())
OUTPUT :
Duration Pulse Maxpulse Calories
0 60 110 130 409.1
1 60 117 145 479.0
14
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0
5 60 102 127 300.0
6 60 110 136 374.0
7 45 104 134 253.3
8 30 109 133 195.1
9 60 98 124 269.0
10 60 103 147 329.3
11 60 100 120 250.7
…
…..
Pandas DataFrame to Excel

Example : Write DataFrame to Excel File
You can write the DataFrame to Excel File without mentioning any sheet name.
The step by step process is given below:
1. Have your DataFrame ready. In this example we shall initialize a DataFrame

with some rows and columns.
2. Create an Excel Writer with the name of the output excel file, to which
you would like to write our DataFrame.
3. Call to_excel() function on the DataFrame with the Excel Writer passed as
argument.
4. Save the Excel file using save() method of Excel Writer.
import pandas as pd
# create dataframe
df_marks = pd.DataFrame({'name': ['raju', 'ramu', 'ravi', 'akash'],
'physics': [68, 74, 77, 78],
'chemistry': [84, 56, 73, 69],
'algebra': [78, 88, 82, 87]})
# create excel writer object

writer = pd.ExcelWriter('output2.xlsx')
# write dataframe to excel (a.b(dat))

df_marks.to_excel(writer)
# save the excel

writer.save()
print('DataFrame is written successfully to Excel File.')
print(df_marks)
15
OUTPUT :
DataFrame is written successfully to Excel File.
name physics chemistry algebra
0 raju 68 84 78
1 ramu 74 56 88
2 ravi 77 73 82
3 akash 78 69 87
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('pandas.csv')
print(df)
df.plot()
plt.show()
OUTPUT :
Name maths physics chemisry
0 a 11 21 31
1 b 12 22 32
2 c 13 23 32
3 d 14 24 34
16

Pandas

Uploaded by

Copyright:

Available Formats

Pandas

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Pandas

Uploaded by

Copyright:

Available Formats

Python - Pandas

Pandas is a Python library used for working with data sets.

It has functions for analyzing, cleaning, exploring, and manipulating data.

Why Use Pandas?

Pandas allows us to analyze big data and make conclusions based on

Relevant data is very important in data science.

What Can Pandas Do?

What is average value?

Where is the Pandas Codebase?

The source code for Pandas is located at this github repository

Install it using this command:

C:\Users\Your Name>pip install pandas

1.2 Import Pandas

Once Pandas is installed, import it in your applications by adding the

Syntax : import pandas

Pandas is usually imported under the pd alias.

### Syntax : import pandas as pd

1.4 Checking Pandas Version

A Pandas Series is like a column in a table.

It is a one-dimensional array holding data of any type.

The datatype of the elements in the Series is int64.

a = [1.1, 2.2, 3.3]

a = ["apple", "banana", "orange"]

This label can be used to access a specified value.

Example 2.2 : Return the second value of the Series:

Example 2.3 : Create you own labels

myvar = pd.Series(a, index = ["x", "y", "z"])

Example 2.4 : Return the value of "y":

myvar = pd.Series(a, index = ["x", "y", "z"])

A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional

In Python Pandas module, DataFrame is a very basic and important type.

Syntax of DataFrame() class :

DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)

Example 3.1 Create an Empty DataFrame

To create an empty DataFrame, pass no arguments to pandas.DataFrame() class.

In this example, we create an empty DataFrame and print it to the console

Example 3.2 : Create a simple Pandas DataFrame

#load data into a DataFrame object:

df = pd.DataFrame(data, index = ["day1", "day2", "day3"])

3.2 Create Pandas DataFrame from List of Lists?

Example 3.4: Create DataFrame from List of Lists

data = [['a1', 'b1', 'c1'],

data = [['a1', 'b1', 'c1'],

columns = ['C1', 'C2', 'C3']

df = pd.DataFrame(data, index, columns)

3.3 Create Pandas DataFrame from Python Dictionary

You can create a DataFrame from Dictionary by passing a dictionary as the

mydictionary = {'names': ['raju', 'ramu', 'ravi', 'akash'],

Shape or Dimensions of Pandas DataFrame

Example: DataFrame Shape

data = [['a1', 'b1', 'c1'],

columns = ['C1', 'C2', 'C3']

DataFrame Shape : (4, 3)

Print Information of Pandas DataFrame

To print information of Pandas DataFrame, call DataFrame.info() method.

Example : Print DataFrame Information

In the following program, we have created a DataFrame.

We shall print this DataFrame’s information using DataFrame.info() method.

data = [['a1', 'b1', 'c1'],

columns = ['C1', 'C2', 'C3']