100% found this document useful (1 vote)
346 views

12 Pandas

Pandas is a popular Python library for working with 1D and 2D data sets. It provides pandas Series for 1D data like lists and pandas DataFrame for 2D tabular data. A DataFrame is a two-dimensional data structure with labeled axes (rows and columns). It can hold data of different types and allows arithmetic operations on rows and columns. Pandas provides functions to create, manipulate, and analyze DataFrames including reading/writing data from files and performing operations like filtering, aggregation, joining and more.

Uploaded by

Arshpreet Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
346 views

12 Pandas

Pandas is a popular Python library for working with 1D and 2D data sets. It provides pandas Series for 1D data like lists and pandas DataFrame for 2D tabular data. A DataFrame is a two-dimensional data structure with labeled axes (rows and columns). It can hold data of different types and allows arithmetic operations on rows and columns. Pandas provides functions to create, manipulate, and analyze DataFrames including reading/writing data from files and performing operations like filtering, aggregation, joining and more.

Uploaded by

Arshpreet Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

PANDAS

For Python programming language the most


popular library for working with 1d/2d data sets
is Pandas.
⦿ For 1D data such as a sequence of
numbers pandas.Series object is very
appropriate.

#list
myList = ["The", "earth", "revolves", "around", "sun"]
print(myList) #printing list

Output:
['The', 'earth', 'revolves', 'around', 'sun']

⦿ For 2D data such object is


called pandas.DataFrame.
⦿ 3D data
DATA FRAME
⦿ A Data frame is a two-dimensional data
structure, i.e., data is aligned in a tabular
fashion in rows and columns.

Features of Data Frame


⦿ Potentially columns are of different types
⦿ Size – Mutable
⦿ Labeled axes (rows and columns)
⦿ Can Perform Arithmetic operations on rows
and columns
STRUCTURE
⦿ Let us assume that we are creating a data
frame with student’s data
PANDAS.DATAFRAME
⦿ A pandas DataFrame can be created using the
following constructor −

pandas.DataFrame( data, index, columns, dtype, copy)


•Create an Empty DataFrame
A basic DataFrame, which can be created is an Empty Dataframe.
Example:

#import the pandas library and aliasing as pd


import pandas as pd
df = pd.DataFrame()
print df

Its output is as follows −


Empty DataFrame Columns: [] Index: []
EXAMPLE
import pandas as pd
data = [['Aman',10],[‘Ajay',12],[‘Abhi',13]]
df = pd.DataFrame(data,columns=['Name','Age'],dtype=float)
print df
ItsName Age
Output is as follows:
0 Aman 10.0
1 Ajay 12.0
2 Abhi 13.0
EXAMPLE TO CREATE CSV FILE
import pandas as pd

names = ['Bob','Jessica','Mary','John','Mel']
births = [968, 155, 77, 578, 973]

BabyDataSet = list(zip(names,births))
print(BabyDataSet)

df = pd.DataFrame(data = BabyDataSet, columns=['Names', 'Births'])


print(df)

df.to_csv('demo.csv')

Output

[('Bob', 968), ('Jessica', 155), ('Mary', 77), ('John', 578), ('Mel',


973)]
Names Births
0 Bob 968
1Jessica 155
2 Mary 77
3 John 578
4 Mel 973
COLUMN ADDITION
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
# Adding a new column to an existing DataFrame object with column label by passing new series
print ("Adding a new column by passing as Series:")
df['three']=pd.Series([10,20,30],index=['a','b','c'])
print df
print ("Adding a new column using the existing columns in DataFrame:")
df['four']=df['one']+df['three']
print df

⦿ Output
Adding a new column by passing as Series:
one two three
a 1.0 1 10.0
b 2.0 2 20.0
c 3.0 3 30.0
d NaN 4 NaN
Adding a new column using the existing columns in DataFrame:
one two three four
a 1.0 1 10.0 11.0
b 2.0 2 20.0 22.0
c 3.0 3 30.0 33.0
d NaN 4 NaN
MAX
# importing pandas as pd
Output:
import pandas as pd

# Creating the dataframe


df = pd.DataFrame({"A":[12, 4, 5, None, 1],
"B":[7, 2, 54, 3, None],
"C":[20, 16, 11, 3, 8],
"D":[14, 3, None, 2, 6]})

# skip the Na values while finding the maximum


df.max(axis = 1)

Max() is used to find the maximum value .


Similarly , to find the minimum value we use min() in place of max()
Mean Function in Python pandas
(Dataframe, Row and column wise mean)

mean() – Mean Function in python pandas is used to calculate the


arithmetic mean of a given set of numbers, mean of a data frame
,mean of column and mean of rows .

import pandas as pd
import numpy as np

#Create a DataFrame
d = { 'Name':['Alisa','Bobby','Cathrine','Madonna','Rocky',
'Sebastian','Jaqluine', 'Rahul','David','Andrew','Ajay','Teresa'],
'Score1':[62,47,55,74,31,77,85,63,42,32,71,57],
'Score2':[89,87,67,55,47,72,76,79,44,92,99,69]}
df = pd.DataFrame(d)
df
# mean of the dataframe
df.mean()
Output:
Score1 58.0
Score2 73.0
dtype: float64
Sorting :
from pandas import DataFrame
import pandas as pd
d = {'one':[2,3,1,4,5], 'two':[5,4,3,2,1], 'letter':['a','a','b','b','c']}
df = DataFrame(d)
test = df.sort_values(['one'], ascending=[False])

the output is:


letter one two
2 b 1 3
0 a 2 5
1 a 3 4
3 b 4 2
4 c 5 1
If ascending=False , data will be sorted in descending order.
Otherwise, by default the data will be sorted in ascending
order.
Groupby
Output:
employme name age employment_status state
name age nt_status state
0 Anush 23 emp pb
Anush 23 emp pb
Ankush 32 unemp pb
1 Ankush 32 unemp pb
Alisha 21 emp pb 2 Alisha 21 emp pb
Rohit 34 emp hp 3 Rohit 34 emp hp
Komal 26 unemp hr 4 Komal 26 unemp hr
Karthik 29 emp hr 5 Karthik 29 emp hr

name
state
import pandas as pd
import numpy as np hp 1
hr 2
pb 3
df1 =
pd.read_csv('datasets/stackdata
setexample.csv')
print(df1) pb 3
#print Hr 2
(df1.groupby(["state"])[['name']]. hp 1
count()) Name: state, dtype: int64
j=df1['state'].value_counts()
print(j)
Drop Duplicate and missing value
Duplicate data Missing data

A B C Aman CSE Python


Anu IT
foo 0A
Anuradha CSE PHP
foo 1A
Nisha BigData
foo 1B Pankaj CSE
bar 1A Ankit Java
foo 0A Rohit IT Android
Anu IT

import pandas as pd import pandas as pd

df = pd.read_csv('datasets\dropduplicatesexample.csv') #if we want to write 0 in those columns which have nan


print(df)
#df = pd.read_csv('datasets/dropnaexample.csv')
ee=df.drop_duplicates() df = pd.read_csv('datasets/dropnaexample.csv', header=None)
#print(ee) #check whole row for duplicacy print(df)

e=df.drop_duplicates(subset=['A', 'C']) df_drop_missing = df.dropna()


print(e) #drop rows which match on columns A and C #print(df_drop_missing)
e.to_csv("aaa.csv")
df_fill = df.fillna(1) #you can fill any number
print(df_fill)
Filters
name year salary
0 Aman 2017 40000
1 Raman 2017 24000 Output:
2 Anita 2017 31000
3 Kajal 2017 20000
4 Arun 2017 30000 Unnamed: 0 name year salary
5 Aman 2017 25000 0 0 Aman 2017 40000
1 1 Raman 2017 24000
2 2 Anita 2017 31000
3 3 Kajal 2017 20000
import pandas as pd 4 4 Arun 2017 30000
import numpy as np 5 5 Aman 2017 25000

df = pd.read_csv('datasets/filtersexample.csv') Unnamed: 0 name year salary


#print(df) 0 0 Aman 2017 40000
2 2 Anita 2017 31000
filtered = df.query('salary>30000') #salary greater than 30,000
#print(filtered)
Unnamed: 0 name year salary
df_filtered = df[(df.salary >= 30000) & (df.year == 2017)] 0 0 Aman 2017 40000
#print(df_filtered) 2 2 Anita 2017 31000
4 4 Arun 2017 30000
#print(df.salary.unique()) # list of unique items
#print(df.name.nunique()) #give the count of unque values
[40000 24000 31000 20000 30000 25000]

5
Joins
df_a df_b
subject_id first_name last_name first_nam
last_name
subject_id e
0 1 Ajay Anderson
0 4 Billy Bonder
1 2 Abhi Ackerman
1 5 Navi Black
2 3 Aman Ali
2 6 Swati Balwner
3 4 Avi Aoni
3 7 Shivali Brice
4 5 Aksh Atiches
4 8 Kamal Btisan

df_new
last_name
subject_id first_name

0 1 Ajay Anderson
1 2 Abhi Ackerman
2 3 Aman Ali
3 4 Avi Aoni
4 5 Aksh Atiches df_new = pd.concat([df_a, df_b])
0 4 Billy Bonder df_new
1 5 Navi Black
2 6 Swati Balwner
3 7 Shivali Brice
4 8 Kamal Btisan
pd.concat([df_a, df_b], axis=1)

subject_id first_name last_name subject_id first_name last_name

0 1 Ajay Anderson 4 Billy Bonder

1 2 Abhi Ackerman 5 Navi Black

2 3 Aman Ali 6 Swati Balwner


3 4 Avi Aoni 7 Shivali Brice

4 5 Aksh Atiches 8 Kamal Btisan


Merge with right join

pd.merge(df_a, df_b, on='subject_id', how='right')

first_name_x last_name_x first_name_y last_name_y


subject_id

0 4 Avi Aoni Billy Bonder


1 5 Aksh Atiches Navi Black
2 6 NaN NaN Swati Balwner
3 7 NaN NaN Shivali Brice
4 8 NaN NaN Kamal Btisan
Merge with left join
“Left outer join produces a complete set of records from Table A, with
the matching records (where available) in Table B. If there is no
match, the right side will contain null.”

pd.merge(df_a, df_b, on='subject_id', how='left')

subject_id first_name_x last_name_x first_name_y last_name_y

0 1 Ajay Anderson NaN NaN


1 2 Abhi Ackerman NaN NaN
2 3 Aman Ali NaN NaN
3 4 Avi Aoni Billy Bonder
4 5 Aksh Atiches Navi Black
Merge with inner join
“Inner join produces only the set of records
that match in both Table A and Table B.”

pd.merge(df_a, df_b, on='subject_id', how='inner')

first_name_x last_name_x first_name_y last_name_y


subject_id
0 4 Avi Aoni Billy Bonder
1 5 Aksh Atiches Navi Black
Merge with outer join
“Full outer join produces the set of all records in Table A and
Table B, with matching records from both sides where available.
If there is no match, the missing side will contain null.”

pd.merge(df_a, df_b, on='subject_id', how='outer')

subject_id first_name_x last_name_x first_name_y last_name_y

0 1 Ajay Anderson NaN NaN

1 2 Abhi Ackerman NaN NaN

2 3 Aman Ali NaN NaN


3 4 Avi Aoni Billy Bonder
4 5 Aksh Atiches Navi Black
5 6 NaN NaN Swati Balwner
6 7 NaN NaN Shivali Brice
7 8 NaN NaN Kamal Btisan

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy