Data Handlinng Using Pandas
Data Handlinng Using Pandas
Pandas:
What is Pandas?
• Pandas is a Python library used for working with data sets.
• It has functions for analyzing, cleaning, exploring, and manipulating data.
• It is a package useful for data analysis and manipulation.
• Pandas provide an easy way to create, manipulate and wrangle the data.
• Pandas provide powerful and easy-to-use data structures, as well as the
means to quickly perform operations on these structures.
Data Science: is a branch of computer science where we study how to store, use
and analyze data for deriving information from it.
e.g.-
Index Data
0 10
1 15
2 18
3 22
Program-
import pandas as pd
Output-
import numpy as np Default Index
0 10
arr=np.array([10,15,18,22])
1 15
s = pd.Series(arr) 2 18
print(s) 3 22
Data
Here we create an
array of 4 values.
How to create Series with Mutable index
Program-
print(s)
Creating a series from Scalar value
Print all the values of the Series that are greater than 2.
Example-2
Result of s.head()
Result of s.head(3)
tail(): It is used to access the last 5 rows of a series.
Note :To access last 4 rows we can call series_name.tail (4)
Selection in Series
Series provides index label loc and ilocand [] to access rows and
columns.
Syntax:-series_name.loc[StartRange: StopRange]
Example-
Syntax:-series_name.iloc[StartRange : StopRange]
Example-
Syntax:-series_name[StartRange> : StopRange] or
series_name[ index]
Example-
Example-
Slicing in Series
The segments start representing the first item, end representing the
last item, and step representing the increment between each item that
you would like.
Example :-
DATAFRAME
DATAFRAME STRUCTURE
0 ROHIT MI 13
1 VIRAT RCB 17
2 HARDIK MI 14
INDEX DATA
PROPERTIES OF DATAFRAME
1. Series
2. Lists
3. Dictionary
4. A numpy 2D array
Program-
Output-
import pandas as pd
0
s = pd.Series(['a','b','c','d']) 0 a
1 b Default Column Name As 0
df=pd.DataFrame(s)
2 c
print(df) 3 d
DataFrame from Dictionary of Series
Example-
Example-
Iteration on Rows and Columns
1. iterrows ()
2. iteritems ()
iterrows()
Example-
Select operation in data frame
To access the column data ,we can mention the column name as
subscript.
e.g. - df[empid] This can also be done by using df.empid.
To access multiple columns we can write as df[ [col1, col2,---] ]
Example -
>>df.empid or df[‘empid’]
0 101
1 102
2 103
3 104
4 105
5 106
Name: empid, dtype: int64
>>df[[‘empid’,’ename’]]
empid ename
0 101 Sachin
1 102 Vinod
2 103 Lakhbir
3 104 Anil
4 105 Devinder
5 106 UmaSelvi
To Add & Rename a column in data
frame
import pandas as pd
s = pd.Series([10,15,18,22])
df=pd.DataFrame(s)
df[‘List3’]=df[‘List1’]+df[‘List2’] Output-
List1 List2
0 10 20
1 15 20
2 18 20
3 22 20
List1
0 10
1 15
2 18
3 22
To Delete a Column Using drop()
import pandas as pd
s= pd.Series([10,20,30,40])
df=pd.DataFrame(s)
df.columns=[‘List1’]
df[‘List2’]=40
df1=df.drop(‘List2’,axis=1) (axis=1) means to delete Data
column wise
df2=df.drop(index=[2,3],axis=0) (axis=0) means to delete
data row wise with given index
print(df)
print(“ After deletion::”)
print(df1)
print (“ After row deletion::”)
print(df2)
Output-
List1 List2
0 10 40
1 20 40
2 30 40
3 40 40
After deletion::
List1
0 10
1 20
2 30
3 40
After row deletion::
List1
0 10
1 20
Accessing the data frame through loc()
and iloc() method or indexing using Labels
Pandas provide loc() and iloc() methods to access the subset from a
data frame using row/column.
Syntax-
Syntax-
The method head() gives the first 5 rows and the method
tail() returns the last 5 rows.
import pandas as pd
empdata={ 'Doj':['12-01-2012','15-01-2012','05-09-2007',
'17-01-2012','05-09-2007','16-01-2012'],
'empid':[101,102,103,104,105,106],
'ename':['Sachin','Vinod','Lakhbir','Anil','Devinder','UmaSelvi']
}
df=pd.DataFrame(empdata)
print(df)
print(df.head())
print(df.tail())
Output-
Doj empid ename
0 12-01-2012 101 Sachin
1 15-01-2012 102 Vinod
2 05-09-2007 103 Lakhbir Data Frame
3 17-01-2012 104 Anil
4 05-09-2007 105 Devinder
5 16-01-2012 106 UmaSelvi
Doj empid ename
0 12-01-2012 101 Sachin
1 15-01-2012 102 Vinod head() displays first 5 rows
2 05-09-2007 103 Lakhbir
3 17-01-2012 104 Anil
4 05-09-2007 105 Devinder
Doj empid ename
1 15-01-2012 102 Vinod
2 05-09-2007 103 Lakhbir
3 17-01-2012 104 Anil tail() display last 5 rows
4 05-09-2007 105 Devinder
5 16-01-2012 106 UmaSelvi
CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF KANPUR
Visit Python4csip.com for more updates
To display first 2 rows we can use head(2) and to returns last2
rows we can use tail(2) and to return 3rd to 4th row we can write
df[2:5].
import pandas as pd
empdata={ 'Doj':['12-01-2012','15-01-2012','05-09-2007',
'17-01-2012','05-09-2007','16-01-2012'],
'empid':[101,102,103,104,105,106],
'ename':['Sachin','Vinod','Lakhbir','Anil','Devinder','UmaSelvi']
}
df=pd.DataFrame(empdata)
print(df)
print(df.head(2))
print(df.tail(2))
print(df[2:5])
Output-
Doj empid ename
0 12-01-2012 101 Sachin
1 15-01-2012 102 Vinod
2 05-09-2007 103 Lakhbir
3 17-01- 2012 104 Anil
4 05-09-2007 105 Devinder
5 16-01-2012 106 UmaSelvi
CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF KANPUR
Visit Python4csip.com for more updates
CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF KANPUR
Visit Python4csip.com for more updates
CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF KANPUR
Visit Python4csip.com for more updates
CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF KANPUR
Visit Python4csip.com for more updates
CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF KANPUR
Visit Python4csip.com for more updates
Example-1
CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF KANPUR
Visit Python4csip.com for more updates
Example-2
CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF KANPUR
Visit Python4csip.com for more updates
1. Full Outer Join:- The full outer join combines the results of
both the left and the right outer joins. The joined data frame will
contain all records from both the data frames and fill in NaNs for
missing matches on either side. You can perform a full outer join by
specifying the how argument as outer in merge() function.
Example-
CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF KANPUR
Visit Python4csip.com for more updates
Example-2
CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF KANPUR
Visit Python4csip.com for more updates
2. Inner Join :- The inner join produce only those records that
match in both the data frame. You have to pass inner in how argument
inside merge() function.
Example-
CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF KANPUR
Visit Python4csip.com for more updates
Example-
CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF KANPUR
Visit Python4csip.com for more updates
Example-
CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF KANPUR
Visit Python4csip.com for more updates
Example-
CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF KANPUR
Visit Python4csip.com for more updates
CSV File
CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF KANPUR
Visit Python4csip.com for more updates
CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF KANPUR
Visit Python4csip.com for more updates
CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF KANPUR
Visit Python4csip.com for more updates
CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF KANPUR