Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
• NumPy, Pandas and Matplotlib are three well-established Python libraries. These
libraries allows us to manipulate, transform and visualize data easily and efficiently.
Output:
Series([], dtype: float64)
Creating a series using Series() method with Arguments
A series is created using Series() method by passing index and data elements as
the arguments to it.
Syntax:
<Series object> = pandas. Series(data, index =idx)
* series output has 2 columns index on left and data value is on right. If we don’t
specify index, default index will be taken from 0 to N-1.
Create a Series using List:
# Example 2: creating a series using Series() with List as an argument
>>> import pandas as pd
>>> s1 = pd. Series([10,20,30,40])
>>> print(s1)
Output:
0 10
1 20
2 30
3 40
dtype: int64
Creating a series using range method
>>>import pandas as pd
>>> s1 = pd.Series(range(5))
>>> print(s1)
0 0
1 1
2 2
3 3
4 4
dtype: int64
Creating a series with explicit index values:
>>> import pandas as pd
>>> s1 = pd. Series( [10, 20, 30, 40, 50], index = ['a’, 'b',’ c',’ d',’ e’] )
>>> print(s1)
a 10
b 20
c 30
d 40
e 50
dtype: int64
Creating a Series from ndarray
Without index Argument
>>> import pandas as pd
>>> import numpy as np
>>> data = np. array (['a’, 'b’, 'c’, 'd'])
>>> s1 = pd.Series(data)
>>> print(s1)
Output:
0 a
1 b
2 c
3 d
dtype: object
Creating a Series from ndarray
With index Argument
>>> import pandas as pd
>>> import numpy as np
>>> data = np. array (['a’, 'b’, 'c’, 'd’])
>>> s1 = pd.Series( data, index=[100,101,102,103] )
>>> print(s1)
Ouput:
100 a
101 b
102 c
103 d
dtype: object
Create a Series from dict
Eg.1(without index)
>>> import pandas as pd
>>> data = {'a':0,'b':1,'c':2}
>>> s1 = pd.Series ( data)
>>> print(s1)
Output:
a 0
b 1
c 2
dtype: int64
Eg.2 (with index)
>>> import pandas as pd
>>> data = {'a':0,'b':1,'c':2}
>>> s1 =pd.Series( data, index= ['b' ,'c', 'd' ,'a'])
>>> print(s1)
Output:
b 1.0
c 2.0
d NaN Not a Number
a 0.0
dtype: float64
Create a Series from Scalar
>>> import pandas as pd
>>> s1 =pd.Series(5, index=[1,2,3,4])
>>> print(s1)
Output:
1 5
2 5
3 5
4 5
dtype: int64
Note :- here 5 is repeated for 4 times (as per no of index)
Creating a series using arange method of numpy
>>> import pandas as pd
>>> s1=pd.Series(np.arange(10,16,1),index=['a','b','c','d','e','f'])
>>>print(s1)
a 10
b 11
c 12
d 13
e 14
f 15
dtype: int32
Accessing elements of a series
* There are 2 methods indexing and slicing
A) Indexing
Two types of indexes are: positional index and labelled index. Positional indexing
is default index starting from 0, whereas labelled index is user defined index.
Example 1:
>>> import pandas as pd
>>>s1 = pd.Series([ 10, 20,30, 40,50])
>>>print(s1[2] )
30
Example 2:
>>> import pandas as pd
>>>s1 = pd.Series([ 10, 20,30, 40,50],index = ['a','b','c','d','e'])
>>> print(s1['d'] )
40
>>> print(s1[['a','c','e']])
Output:
a 10
c 30
e 50
dtype: int64
Example 3:
>>>import pandas as pd
>>>sercap=pd.Series([‘NewDelhi’,’London’,’Paris’],
index=[‘India’,’UK’,’France’])
>>>print(sercap[‘India’]) >>>print(sercap[[‘UK’,’France’]])
NewDelhi UK London
France Paris
dtype: object
How to assign new index values to series
>>>sercap.index=[10,20,30]
>>>print(sercap)
10 NewDelhi
20 London
30 Paris
dtype: Object
B) Slicing
• Similar to slicing with NumPy arrays
• Slicing can be done by specifying the starting and ending parameters.
• In positional index the value at the end index position is excluded.
Example:
>>>import pandas as pd
>>>sercap=pd.Series([‘NewDelhi’, ’WashingtonDC’, ’London’, ’Paris’], index=[‘India’,
’USA’, ’UK’, ’France’])
>>>print(sercap[1:3])
output
USA WashingtonDC
UK London
dtype: object
Example using labelled index
>>>import pandas as pd
>>>sercap=pd.Series([‘NewDelhi’, ’WashingtonDC’, ’London’, ’Paris’],
index=[‘India’, ’USA’, ’UK’, ’France’])
>>>print(sercap[‘USA’: ‘France’])
USA WashingtonDC
UK London
France Paris
dtype: object
Series in reverse order slicing
>>> import pandas as pd
>>> sercap=pd.Series(['NewDelhi','WashingtonDC','London','Paris'],
index=['India','USA','UK','France'])
>>>print(sercap[: : -1])
France Paris
UK London
USA WashingtonDC
India NewDelhi
dtype: object
How to modify the values of series using slicing
>>> import pandas as pd
>>> s1=pd.Series(range(10,16,1),index=['a','b','c','d','e','f'])
>>> s1[1:3]=50
>>> print(s1)
a 10
b 50
c 50
d 13
e 14
f 15
dtype: int64
Example 2: using index label
>>> import pandas as pd
>>> s1=pd.Series(range (10,16,1),index=['a', 'b', 'c', 'd', 'e‘ ,'f'])
>>> s1['c' :'e']=500
>>> print(s1)
a 10
b 11
c 500
d 500
e 500
f 15
dtype: int64
Accessing Data from Series with indexing and slicing( using position)
e.g. import pandas as pd
>>> s1 = pd.Series([11, 12 ,13 ,14,15],index=[ 'a',’ b’, 'c’, 'd’, 'e'])
>>> print(s1[0]) >>>print(s1[‘a’])
11
>>> print(s1[:3])
a 11
b 12
c 13
dtype: int64
>>> print(s1[-3:])
c 13
d 14
e 15
dtype: int64
In the first statement the element at ‘0’ position is displayed.
In the second statement the first 3 elements from the list are displayed.
In the third statement last 3 index values are displayed because of negative indexing.
Retrieve Data from selection :
• loc is used for indexing or selecting based on name, i.e., by row name and
• iloc is used for indexing or selecting based on position , i.e., by row number
Output: >>>print(s1.loc[49:47])
49 NaN
48 NaN
47 NaN
dtype: float 64
e.g.2 >>> import pandas as pd
>>> import numpy as np
>>> s1 = pd.Series( np. nan, index=[49,48,47,46,45, 1, 2, 3, 4, 5])
>>>print(s1. loc[ 49 : 1] ) # selects the data according to the index name
Output:
49 NaN >>>print(s1.iloc[ :6])
48 NaN
47 NaN
46 NaN
45 NaN
1 NaN
dtype: float 64
Conditional Filtering Entries:
>>> import pandas as pd
>>> s1 = pd. Series([1.00000,1.414214,1.730751,2.000000])
>>> print(s1) >>> print(s1 < 2)
Output: Output :
0 1.000000 0 True
1 1.414214 1 True
2 1.730751 2 True
3 2.000000
3 False
dtype: float64
dtype: bool
Note :
>>>print(s1 [s1>=2]) • In the statement s <2 , it performs a vectorized operation
Output: which checks every element in the series.
3 2.0 • In the statement s1[s1>=2] it performs filtering operation
dtype: float64 and returns filter result whose values return True for the
>>> print(s1 [s1 < 2]) expression.
Output:
0 1.000000
1 1.414214
2 1.730751
dtype: float64
Conditional Filtering Entries
Filtering entries from a series object can be done using expressions that are of
Boolean type.
<Series object> [ <Boolean expression on series object>]
Example:
Series object s11 stores the charity contribution made by each section
A 6700
B 5600
C 5000
D 5200
Write a program to display which section contributed more than Rs. 5500
Output:
Contribution >5500 are:
A 6700
B 5600
dtype: int64
Program:
>>> import pandas as pd
>>> s11= pd.Series([6700,5600,5000,5200],index=['A','B','C','D'])
>>> print("Contribution >5500 are:")
>>> print(s11[s11>5500])
Output:
Contribution >5500 are:
A 6700
B 5600
dtype: int64
Sorting Series values:
Series object can be sorted based on values and indexes.
• head()
• tail()
• count()
• Series .head () is a series function that fetches first ‘n’ from a Pandas object.
• By default it gives the top 5 rows of the series.
• Series. tail () is a series function displays the last five elements by default.
Example 1: Example 2:
>>>import pandas as pd >>>import pandas as pd
>>> s1=pd.Series([1,2,3,4,5],index=['a','b','c','d','e']) >>> s1= pd.Series([1,2,3,4,5],index=['a','b','c','d','e'])
>>> print(s1.head(3)) >>> print(s1.head())
output output
a 1 a 1
b 2
b 2 c 3
c 3 d 4
e 5
dtype: int64 dtype: int64
Pandas tail () function:
>>>import pandas as pd
>>>import pandas as pd >>> s1= pd.Series([1,2,3,4,5],index=['a','b','c','d’,’e])
>>> s1= pd.Series([1,2,3,4,5],index=['a','b','c','d’,’e]) >>> print(s1.tail())
>>> print(s1.tail(2)) Output:
Output: a 1
d 4 b 2
e 5 c 3
dtype: int64 d 4
e 5
dtype: int64
pandas count() function:
>>> print(s1.count())
>>> print(s1.count())
output
output
5 4
Homework
Consider the following code:
>>> import pandas as pd
>>> import numpy as np
>>> s1=pd.Series([12,np.nan,10])
>>> print(s1)
Find the output and write a python statement to count and display only non null
values in the above series.
Output
ii) >>> s1.count()
i)
2
0 12.0
1 NaN
2 10.0
dtype: float64
Series Object Attributes:
Properties of a series through its associated attributes.
1) Series. index returns index of the series
2) Series. values returns ndarray
3) Series. dtype returns dtype object of the underlying data.
4) Series. shape returns tuple of the shape of the underlying data.
5) Series. nbytes returns number of bytes of underlying data.
6) Series. ndim returns the number of dimension
7) Series. size returns number of elements.
8) Series. hasnans returns true if there is any NaN
9) Series. empty returns true if series object is empty.
Naming the Series and the index column
>>> import pandas as pd
>>> >>> s1 = pd.Series({'Jan':31,"Feb":28,"Mar":31,"Apr":30})
>>> s1.name="Days"
>>> s1.index.name="Months"
>>> print(s1)
Output:
Months
Jan 31
Feb 28
Mar 31
Apr 30
Name: Days, dtype: int64
>>> import pandas as pd
>>> s1 = pd.Series( range(1, 15, 3), index= [x for x in 'abcde'])
>>> s1.index
Index(['a', 'b', 'c', 'd', 'e'], dtype='object')
>>> s1.values
array([ 1, 4, 7, 10, 13], dtype=int64)
>>> s1.dtype
dtype('int64')
>>> s1.shape
(5,)
>>> s1.nbytes
40
>>> s1.ndim
1
>>> s1.size
5
>>> s1.hasnans
False
>>> s1.empty
False
Sumitha Arora pg no 297 class 11
• Int 8 1 byte
• Int 16 2 bytes
• Int 32 4 bytes
• Int 64 8 bytes
Mathematical operations with Series
e.g.1: e.g.2:
import pandas as pd import pandas as pd
>>> s1 = pd.Series([1,2,3]) >>> s1 = pd.Series([1,2,3])
>>> s2 = pd.Series([1,2,4]) >>> s2 = pd.Series([1,2,4])
>>> s3 = s1+s2 >>> s3 = s1 * s2
>>> print(s3) >>> print(s3)
Output: Output:
0 2 0 1
1 4 1 4
2 7 2 12
dtype: int64 dtype: int64
Mathematical operations with Series
e.g. 4
e.g. 3
>>>import pandas as pd
>>>import pandas as pd
>>> import numpy as np
>>> import numpy as np
>>> s1 = np. arange(10,15)
>>> s1 = np. arange(10,15)
>>> s2 = pd.Series(index= s1, data= s1**4)
>>> s2 = pd.Series(index= s1, data= s1 *4)
>>> print(s2)
>>> print(s2)
Output:
Output:
10 10000
10 40
11 14641
11 44
12 20736
12 48
13 28561
13 52
14 38416
14 56
dtype: int32
dtype: int32
Mathematical operations with Series
e.g. 6
e.g. 5 concat your firstname with your lastname
>>> import pandas as pd >>>import pandas as pd
>>> data =['I','n','f','o','r’] >>> s1 = [ 'a',’ b’, 'c’]
>>> s1 = pd.Series(data+['m','a','t','i','c','s’])
>>> s1 >>> s2 = pd.Series(data= s1 *2)
Output: >>> print(s2)
0 I
1 n Output:
2 f 0 a
3 o
4 r 1 b
5 m 2 c
6 a
7 t 3 a
8 i 4 b
9 c
10 s 5 c
dtype: object dtype: object
Note :
• Arithmetic operations is possible on objects of same index;
otherwise will result as NaN
Homework:
Features of DataFrames:
i) Columns can be of different types.
ii) Size of DataFrame is mutable . i.e. no of rows and columns can be changed.
iii) Its data/ values are also mutable.
iv) Labelled axes( rows/ columns)
v) Arithmetic operations on rows and columns
vi) Indexes may constitute numbers, strings or characters.
Create DataFrame :
It can be created with followings
• Lists
• dictionary
• Series
• Numpy ndarray
• Another DataFrame
>>>import pandas as pd
>>>data=[{'a':10,'b':20},{'a':5,'b':10,'c':20}]
>>>df= pd.DataFrame(data)
>>>print(df)
Output:
a b c
0 10 20 NaN
1 5 10 20.0
v)Creating a DataFrame from dictionary of lists
Example : WAP to store 5 students name, marks , sport in a DataFrame using values as lists.
>>>dict1={'students':['Raj’, 'Neha',’ Sunil’, 'Jamaal’, 'Ruchika'],'marks':[98,65.7,45,78,79],'sport':['Tennis’, 'Badminton’,
'Football’, 'Squash’, 'Kabaddi']}
>>> import pandas as pd
>>> df1= pd. DataFrame(dict1)
>>> print(df1)
Output:
students marks sport
0 Raj 98.0 Tennis
1 Neha 65.7 Badminton
2 Sunil 45.0 Football
3 Jamaal 78.0 Squash
4 Ruchika 79.0 Kabaddi
>>>dict1={'students':['Raj’, 'Neha',’ Sunil’, 'Jamaal’,
'Ruchika'],'marks':[98,65.7,45,78,79],'sport':['Tennis’, 'Badminton’, 'Football’, 'Squash’,
'Kabaddi']}
>>> import pandas as pd
>>> df1= pd. DataFrame (dict1,index =['I','II','III','IV','V'])
>>>print(df1)
Output:
students marks sport
I Raj 98.0 Tennis
II Neha 65.7 Badminton
III Sunil 45.0 Football
IV Jamaal 78.0 Squash
V Ruchika 79.0 Kabaddi
VI)Creating a dataframe from a 2d dictionary having values as dictionary objects:
Example:
Create and display a dataframe from a 2D dictionary Sales, which stores the quarter-wise sales as
inner dictionary for two years .
>>>sales={'yr1':{'Qtr1':34500,'Qtr2':50000,'Qtr3':23000,'Qtr4':45000},'yr2':{'Qtr1':44500,'Qtr2':5500
0,'Qtr3':25000,'Qtr4':55000}}
>>> dfsales=pd.DataFrame(sales)
>>>print(dfsales)
Output:
yr1 yr2
Qtr1 34500 44500
Qtr2 50000 55000
Qtr3 23000 25000
Qtr4 45000 55000
Homework:
c1 c2 c3
r1 101 113 124
r2 130 140 200
r3 115 216 217
4) Create a DataFrame from Series:
Example 1:
>>> import pandas as pd
>>> student_marks=pd.Series({'Anu':75,'Aarish':98,'Banu':78,'Arpit':89,'Shaurya':97})
>>> student_age=pd.Series({'Anu':15,'Aarish':14,'Banu':18,'Arpit':19,'Shaurya':17})
>>> student_df=pd.DataFrame({'marks':student_marks,'Age':student_age})
>>> print(student_df)
Output:
marks Age
Anu 75 15
Aarish 98 14
Banu 78 18
Arpit 89 19
Shaurya 97 17
Example 2:
>>> s2=pd.Series({'IP':200,'CS':500})
>>> print(df1)
5) Creating a DataFrame object from another DataFrame Object:
Output:
0 1 2
0 1 2 3
1 4 5 6
Sorting Data in a dataframe:
Marketing Sales
age 25 24
Name Neha Rohit
Gender Female Male
1)>>>print(dfn. index)
Index(['age', 'Name', 'Gender'], dtype='object’)
5) >>>print(dfn.T) transpose
age Name Gender
Marketing 25 Neha Female
Sales 24 Rohit Male
6) >>>print(dfn. shape)
(3, 2)
7) >>> print(dfn. shape[0]) # used to see the no of rows
3
8) >>> print(dfn. shape[1]) # used to see the no of columns
2
9) >>> print(dfn. size)
6
10) >>> print(dfn. ndim)
2
11) >>> print(dfn. empty)
False
12) >>> print(dfn. values)
array([[25, 24],
['Neha', 'Rohit'],
['Female', 'Male']], dtype=object)
Methods in Dataframe:
1)>>>print(len(dfn))
3
Output:
Marketing Sales
age 25 24
Name Neha Rohit
5) tail()
print(dfn.tail(2))
Output:
Marketing Sales
Name Neha Rohit
Gender Female Male
7) Selecting / Accessing Data through indexing
Consider a DataFrame Df1
Output:
Schools Hospitals
Delhi 7916 189
Mumbai 8508 208
Kolkata 7226 149
Chennai 7617 157
Output:
Hospitals Schools
Delhi 189 7916
Mumbai 208 8508
Kolkata 149 7226
Chennai 157 7617
Note: Columns appear in the order given in the list in the square brackets
• To access selective columns using slicing:
>>>print(Df1.loc[ : , "Population“ : "Schools“ ])
Output:
Population Hospitals Schools
Delhi 10927986 189 7916
Mumbai 12691836 208 8508
Kolkata 4631392 149 7226
Chennai 4328063 157 7617
Output:
Population Hospitals
Delhi 10927986 189
Mumbai 12691836 208
7) 3) Selecting/ Accessing a subset from a dataframe using row/column names:
• To access a row Give the row label / name. Don’t forget to give colon after
comma
Output:
Population Hospitals Schools
Mumbai 12691836 208 8508
Kolkata 4631392 149 7226
Output:
Population Hospitals Schools
Mumbai 12691836 208 8508
Kolkata 4631392 149 7226
Chennai 4328063 157 7617
7) 4) Selecting Rows / Columns of a DataFrame:
Example 1: Example 2:
Output: Output:
Hospitals Schools Hospitals
Delhi 189 7916 Delhi 189
Mumbai 208 8508 Mumbai 208
7) 5) Selecting/ Accessing Individual Value:
• Give name of row or numeric index in square brackets
Example:
>>> print(Df1. Population [ 'Delhi ‘])
Output:
10927986
• We can use at ( row label or column label) or iat (row index no or column index no)
attribute with DF object.
Example:
>>> print(Df1. at[ 'Chennai’, 'Schools’ ])
Output:
7617
>>>print(Df1. iat [ 3 ,2 ])
Output:
7617
8) Adding/ Modifying Row’s / Column’s Values in DataFrames:
1) Adding / Modifying a Column:
* will modify it, if the column already exists.
* will add a new column, if it does not exist already.
• To change or add a column
>>>Df1[ "Density“ ] = 1219
>>> print( Df1 )
Output:
Population Hospitals Schools Density
Delhi 10927986 189 7916 1219
Mumbai 12691836 208 8508 1219
Kolkata 4631392 149 7226 1219
Chennai 4328063 157 7617 1219
Here all the rows in the new column gets the same value.
We can assign the data values for each row of the columns in the form a list.
>>> Df1["Density"] = [1500,1219,1630,1050]
>>> print ( Df1 )
Output:
Population Hospitals Schools Density
Delhi 10927986 189 7916 1500
Mumbai 12691836 208 8508 1219
Kolkata 4631392 149 7226 1630
Chennai 4328063 157 7617 1050
8) 2) Adding / Modifying a row:
* will modify it, if the row already exists.
* will add a new row, if it does not exist already.
• To change or add a row Bangalore to the dataframe
>>> df1.loc['Bangalore']=[135614,267,6889,1500]
>>> print(df1)
Output:
Population Hospitals Schools Density
Delhi 10927986 189 7916 1500
Mumbai 12691836 208 8508 1219
Kolkata 4631392 149 7226 1630
Chennai 4328063 157 7617 1050
Bangalore 135614 267 6889 1500
To change or add a row Bangalore to the dataframe using at method
>>> df1.at['Bangalore']=[135614,267,6889,1500]
>>> print(df1)
Output:
Population Hospitals Schools Density
Delhi 10927986 189 7916 1500
Mumbai 12691836 208 8508 1219
Kolkata 4631392 149 7226 1630
Chennai 4328063 157 7617 1050
Bangalore 135614 267 6889 1500
8) 3) Modifying a single cell
Example:
>>> df1.at['Bangalore','Schools']=5678
>>> print(df1)
Output:
Population Hospitals Schools
Delhi 10927986 189 7916
Mumbai 12691836 208 8508
Kolkata 4631392 149 7226
Chennai 4328063 157 7617
Bangalore 135614 267 5678
8) 4) Modifying a single cell
>>> df1.loc['Bangalore','Schools']=5679
>>> print(df1)
Output:
Population Hospitals Schools
Delhi 10927986 189 7916
Mumbai 12691836 208 8508
Kolkata 4631392 149 7226
Chennai 4328063 157 7617
Bangalore 135614 267 5679
Deleting Rows / Columns in a DataFrame:
Example:
Output:
Population Hospitals
Delhi 10927986.0 189.0
Mumbai 12691836.0 208.0
Kolkata 4631392.0 149.0
Chennai 4328063.0 157.0
Bangalore 5678097.0 171.0
9) Deleting Rows/Columns
>>> topdf. rename(index={'Sec A':'A' , 'Sec B' : 'B', 'Sec C' : 'C', 'Sec D' : ‘D’} , inplace=True)
>>>print(topdf)
Output:
Roll No Name Marks
A 115 Pavni 97.5
B 236 Rishi 98.0
C 307 Preet 98.5
D 423 Paula 98.0
To change columns labels rollNo as r.no
Output:
Rno Name Marks
A 115 Pavni 97.5
B 236 Rishi 98.0
C 307 Preet 98.5
D 423 Paula 98.0
10) More on DataFrame Indexing – Boolean Indexing
Boolean indexing means having Boolean values [ (True or False) or (1 or 0)] as
indexes in a DataFrame.
Days No of Classes
True Monday 6
False Tuesday 0
True Wednesday 3
False Thursday 0
True Friday 8
Using Boolean indexing we can divide the DataFrame into two groups True rows and
False rows
10) 1) Creating a DataFrame with Boolean Indexes:
Example 1:
import pandas as pd
days= ['Monday', 'Tuesday’, 'Wednesday’, 'Thursday’, 'Friday']
classes =[6, 0, 3, 0, 8]
dc={'Days’ : days, "No of Classes": classes}
classdf= pd .DataFrame (dc, index= [True, False, True, False, True])
print( classdf)
Output:
Days No of Classes
True Monday 6
False Tuesday 0
True Wednesday 3
False Thursday 0
True Friday 8
Example2:
import pandas as pd
days= ['Monday', 'Tuesday’, 'Wednesday’, 'Thursday’, 'Friday']
classes =[6, 0, 3, 0, 8]
dc={'Days’ : days, "No of Classes": classes}
classdf= pd .DataFrame (dc, index= [1, 0, 1, 0, 1])
print( classdf)
Output:
Days No of Classes
1 Monday 6
0 Tuesday 0
1 Wednesday 3
0 Thursday 0
1 Friday 8
10) 2) Accessing rows from DataFrames using Boolean Indexes:
Boolean indexing is useful for filtering out the True or False indexed rows using loc
attribute.
Output:
Days No of Classes
0 Tuesday 0
0 Thursday 0
To Set and Reset index:
subjects.
'English':[56,78,89,90,100],'IP':[78,89,90,67,90],'Maths':[89,90,87,86,90],
'Accounts':[78,89,95,78,89],'Phy':[78,89,90,87,89]}
>>>print(df1)
Output:
Stud_Name English IP Maths Accounts Phy
0 Ajay 56 78 89 78 78
1 Sanjay 78 89 90 89 89
2 Sunil 89 90 87 95 90
3 Amrita 90 67 86 78 87
4 Tom 100 90 90 89 89
>>> df1.set_index('Stud_Name', inplace=True)
>>>print(df1)
>>>df1.set_index(‘Accounts’,inplace=True)
>>>print(df1)
Output:
Stud_Name English IP Maths Accounts Phy
Ajay 56 78 89 78 78
Sanjay 78 89 90 89 89
Sunil 89 90 87 95 90
Amrita 90 67 86 78 87
>>> df1.reset_index(inplace=True)
>>>print(df1)
Output:
Stud_Name English IP Maths Accounts Phy
0 Ajay 56 78 89 78 78
1 Sanjay 78 89 90 89 89
2 Sunil 89 90 87 95 90
3 Amrita 90 67 86 78 87
4 Tom 100 90 90 89 89
Iterating over the Dataframe
• 2 methods <dfobject>.iterrows() and <dfobject>.iteritems()
• iterrows() views the dataframe in the form of horizontal subsets (rows).
• iteritems() views the dataframe in the form of vertical subsets(columns).
• The iterrows() method iterates over the dataframe row wise where each
horizontal subset is in the form of (row-index, series) where series contains the
column values for that row-index.
• The iteritems() method iterates over the dataframe column wise where each
vertical subset is in the form of(column-index, series) where series contains all
row values for that column index.
WAP to create a dataframe and iterate them over rows.
>>>import pandas as pd
>>>data=[["Virat",55,66,31],["Rohit",88,66,43],["Hardik",99,101,68]]
>>>players = pd.DataFrame(data, columns = ["Name","Match-1","Match-2","Match-3"])
>>>print(players)
>>>print("Iterating by rows:")
>>>for (index, row) in players.iterrows():
>>> print(index, row.values)
>>>print("Iterating by columns:")
>>>for (index, row) in players.iterrows():
print(index, row["Name"],row["Match-1"], row["Match-2"],row["Match-3"])
Output:
Name Match-1 Match-2 Match-3
0 Virat 55 66 31
1 Rohit 88 66 43
2 Hardik 99 101 68
Iterating by rows:
0 ['Virat' 55 66 31]
1 ['Rohit' 88 66 43]
2 ['Hardik' 99 101 68]
Iterating by columns:
0 Virat 55 66 31
1 Rohit 88 66 43
2 Hardik 99 101 68
WAP to create a dataframe and print it along with their index using
iteritems().
>>>import pandas as pd
>>>sc_2yrs={2016:{'ViratKohli':2595,'RohitSharma':2406,'ShikharDhawan':2378},
2017:{'Virat Kohli':2818,'Rohit Sharma':2613,'Shikhar Dhawan':2295}}
>>>df=pd.DataFrame(sc_2yrs)
>>>print(df)
>>>print("-------------------------------------------")
>>>for (year,runs) in df.iteritems():
>>> print("Year:",year)
>>> print(runs)
Output:
2016 2017
Virat Kohli 2595 2818
Rohit Sharma 2406 2613
Shikhar Dhawan 2378 2295
-------------------------------------------
Year: 2016
Virat Kohli 2595
Rohit Sharma 2406
Shikhar Dhawan 2378
Name: 2016, dtype: int64
Year: 2017
Virat Kohli 2818
Rohit Sharma 2613
Shikhar Dhawan 2295
Name: 2017, dtype: int64