Unit 4
Unit 4
Unit 4
ndas
Introduction
• Pandas is a Python library used for working with data sets.
• It has functions for analyzing, cleaning, exploring, and manipulating
data.
• The name "Pandas" has a reference to both "Panel Data", and
"Python Data Analysis" and was created by Wes McKinney in 2008.
• It provides various data structures and operations for manipulating
numerical data and time series.
• This library is built on top of the NumPy library.
• Pandas is fast and it has high performance & productivity for users.
Advantages
• It is built on the top of the NumPy library which means that a lot of
structures of NumPy are used or replicated in Pandas.
• The data produced by Pandas are often used as input for plotting
functions of Matplotlib, statistical analysis in SciPy, and machine
learning algorithms in Scikit-learn.
• Pandas program can be run from any text editor but it is
recommended to use Jupyter Notebook for this as Jupyter given the
ability to execute code in a particular cell rather than executing the
entire file. Jupyter also provides an easy way to visualize pandas data
frames and plots.
Pandas data structures for manipulating data,
• Pandas generally provide two data structures for manipulating data,
They are:
• Series
• DataFrame
Series
• In the real world, a Pandas Series will be created by loading the datasets
from existing storage, storage can be SQL Database, CSV file, an Excel file.
Pandas Series can be created from the lists, dictionary, and from a scalar
value etc.
import pandas as pd
import numpy as np
• All the ndarrays must be of same length. If index is passed, then the
length of the index should equal to the length of the arrays.
• If no index is passed, then by default, index will be range(n),
where n is the array length.
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data)
print df
Create a DataFrame from List of Dicts
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data)
print df
The following example shows how to create a DataFrame by passing a
list of dictionaries and the row indices.
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data, index=['first', 'second'])
print df
Create a DataFrame from Dict of Series
import pandas as pd
df = pd.DataFrame(d)
print df
Pandas Dataframe/Series.head() method
• Python is a great language for doing data analysis, primarily because of the fantastic
ecosystem of data-centric Python packages. Pandas is one of those packages and makes
importing and analyzing data much easier.
• Pandas head() method is used to return top n (5 by default) rows of a data frame or
series.
• Syntax: Dataframe.head(n=5)
• Parameters:
• n: integer value, number of rows to be returned
# display
data_top
• In this example, the .head() method is called on series with custom input of n paramet
# importing pandas module
import pandas as pd
# making data frame
data = pd.read_csv("https://media.geeksforgeeks.org/wp-content/uploads/nba.csv")
# number of rows to return
n=9
# creating series
series = data["Name"]
# returning top n rows
top = series.head(n = n)
# display
toper to return top 9 rows of the series.
ADDING ROW NEW ROW
• We can add a single row using DataFrame.loc. We can add the row at
the last in our dataframe. We can get the number of rows
using len(DataFrame.index) for determining the position at which we
need to add the new row.
• Example
import pandas as pd
df = pd. DataFrame({'RollNO':[1,2,3,4,5],
'Name':['Nina','Mohan','John','Priya','Aakash'],
'Marks':[89,96,78,60,99]})
print("Orignal DataFrame")
print(df)
print("After adding new row DataFrame")
df.loc[5]=[6,'Sunita',77]
print(df)
OUTPUT
Orignal DataFrame
RollNO Name Marks
0 1 Nina 89
1 2 Mohan 96
2 3 John 78
3 4 Priya 60
4 5 Aakash 99
After adding new row DataFrame
RollNO Name Marks
0 1 Nina 89
1 2 Mohan 96
2 3 John 78
3 4 Priya 60
4 5 Aakash 99
5 6 Sunita 77
Adding Column in Data Frame
Orignal DataFrame
RollNO Name Marks
0 1 Nina 89
1 2 Mohan 96
2 3 John 78
3 4 Priya 60
4 5 Aakash 99
RollNO Name Marks Division
0 1 Nina 89 IVA
1 2 Mohan 96 IVA
2 3 John 78 IVA
3 4 Priya 60 IVA
4 5 Aakash 99 IVA
METHOD 2: ADD COLUMNS AT A SPECIFIC INDEX
import pandas as pd
df = pd. DataFrame({'RollNO':[1,2,3,4,5],
'Name':['Nina','Mohan','John','Priya','Aakash'],
'Marks':[89,96,78,60,99]})
print("Orignal DataFrame")
print(df)
df.insert(3,"Division",'IVA')
print(df)
OUTPUT
Orignal DataFrame
RollNO Name Marks
0 1 Nina 89
1 2 Mohan 96
2 3 John 78
3 4 Priya 60
4 5 Aakash 99
RollNO Name Marks Division
0 1 Nina 89 IVA
1 2 Mohan 96 IVA
2 3 John 78 IVA
3 4 Priya 60 IVA
4 5 Aakash 99 IVA
METHOD 3: ADD COLUMNS WITH LOC
import pandas as pd
df = pd. DataFrame({'RollNO':[1,2,3,4,5],
'Name':['Nina','Mohan','John','Priya','Aakash'],
'Marks':[89,96,78,60,99]})
print("Orignal DataFrame")
print(df)
df.loc[:, "Division"] = 'IVA'
print(df)
OUTPUT
Orignal DataFrame
RollNO Name Marks
0 1 Nina 89
1 2 Mohan 96
2 3 John 78
3 4 Priya 60
4 5 Aakash 99
RollNO Name Marks Division
0 1 Nina 89 IVA
1 2 Mohan 96 IVA
2 3 John 78 IVA
3 4 Priya 60 IVA
4 5 Aakash 99 IVA
Using Drop remove specific row
• Dropping row with index
import pandas as pd
df = pd. DataFrame({'RollNO':[1,2,3,4,5],
'Name':['Nina','Mohan','John','Priya','Aakash'],
'Marks':[89,96,78,60,99]})
print("Orignal DataFrame")
print(df)
df.drop(4,axis=0,inplace=True)
print(df)
OUTPUT
RollNO Name Marks
0 1 Nina 89
1 2 Mohan 96
2 3 John 78
3 4 Priya 60
4 5 Aakash 99
RollNO Name Marks
0 1 Nina 89
1 2 Mohan 96
2 3 John 78
3 4 Priya 60
Deleting Column
• import pandas as pd
df = pd. DataFrame({'RollNO':[1,2,3,4,5],
'Name':['Nina','Mohan','John','Priya','Aakash'],
'Marks':[89,96,78,60,99]})
print("Orignal DataFrame")
print(df)
df.drop('Marks', inplace=True, axis=1)
print(df)
OUTPUT
Orignal DataFrame
RollNO Name Marks
0 1 Nina 89
1 2 Mohan 96
2 3 John 78
3 4 Priya 60
4 5 Aakash 99
RollNO Name
0 1 Nina
1 2 Mohan
2 3 John
3 4 Priya
4 5 Aakash