100% found this document useful (1 vote)

346 views

12 Pandas

Pandas is a popular Python library for working with 1D and 2D data sets. It provides pandas Series for 1D data like lists and pandas DataFrame for 2D tabular data. A DataFrame is a two-dimensional data structure with labeled axes (rows and columns). It can hold data of different types and allows arithmetic operations on rows and columns. Pandas provides functions to create, manipulate, and analyze DataFrames including reading/writing data from files and performing operations like filtering, aggregation, joining and more.

Uploaded by

Arshpreet Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

346 views

12 Pandas

Uploaded by

Arshpreet Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

PANDAS

For Python programming language the most

popular library for working with 1d/2d data sets
is Pandas.
⦿ For 1D data such as a sequence of
numbers pandas.Series object is very
appropriate.

#list
myList = ["The", "earth", "revolves", "around", "sun"]
print(myList) #printing list

Output:
['The', 'earth', 'revolves', 'around', 'sun']

⦿ For 2D data such object is

called pandas.DataFrame.
⦿ 3D data
DATA FRAME
⦿ A Data frame is a two-dimensional data
structure, i.e., data is aligned in a tabular
fashion in rows and columns.

Features of Data Frame

⦿ Potentially columns are of different types
⦿ Size – Mutable
⦿ Labeled axes (rows and columns)
⦿ Can Perform Arithmetic operations on rows
and columns
STRUCTURE
⦿ Let us assume that we are creating a data
frame with student’s data
PANDAS.DATAFRAME
⦿ A pandas DataFrame can be created using the
following constructor −

pandas.DataFrame( data, index, columns, dtype, copy)

•Create an Empty DataFrame
A basic DataFrame, which can be created is an Empty Dataframe.
Example:

#import the pandas library and aliasing as pd

import pandas as pd
df = pd.DataFrame()
print df

Its output is as follows −

Empty DataFrame Columns: [] Index: []
EXAMPLE
import pandas as pd
data = [['Aman',10],[‘Ajay',12],[‘Abhi',13]]
df = pd.DataFrame(data,columns=['Name','Age'],dtype=float)
print df
ItsName Age
Output is as follows:
0 Aman 10.0
1 Ajay 12.0
2 Abhi 13.0
EXAMPLE TO CREATE CSV FILE
import pandas as pd

names = ['Bob','Jessica','Mary','John','Mel']
births = [968, 155, 77, 578, 973]

BabyDataSet = list(zip(names,births))
print(BabyDataSet)

df = pd.DataFrame(data = BabyDataSet, columns=['Names', 'Births'])

print(df)

df.to_csv('demo.csv')

Output

[('Bob', 968), ('Jessica', 155), ('Mary', 77), ('John', 578), ('Mel',

973)]
Names Births
0 Bob 968
1Jessica 155
2 Mary 77
3 John 578
4 Mel 973
COLUMN ADDITION
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
# Adding a new column to an existing DataFrame object with column label by passing new series
print ("Adding a new column by passing as Series:")
df['three']=pd.Series([10,20,30],index=['a','b','c'])
print df
print ("Adding a new column using the existing columns in DataFrame:")
df['four']=df['one']+df['three']
print df

⦿ Output
Adding a new column by passing as Series:
one two three
a 1.0 1 10.0
b 2.0 2 20.0
c 3.0 3 30.0
d NaN 4 NaN
Adding a new column using the existing columns in DataFrame:
one two three four
a 1.0 1 10.0 11.0
b 2.0 2 20.0 22.0
c 3.0 3 30.0 33.0
d NaN 4 NaN
MAX
# importing pandas as pd
Output:
import pandas as pd

# Creating the dataframe

df = pd.DataFrame({"A":[12, 4, 5, None, 1],
"B":[7, 2, 54, 3, None],
"C":[20, 16, 11, 3, 8],
"D":[14, 3, None, 2, 6]})

# skip the Na values while finding the maximum

df.max(axis = 1)

Max() is used to find the maximum value .

Similarly , to find the minimum value we use min() in place of max()
Mean Function in Python pandas
(Dataframe, Row and column wise mean)

mean() – Mean Function in python pandas is used to calculate the

arithmetic mean of a given set of numbers, mean of a data frame
,mean of column and mean of rows .

import pandas as pd
import numpy as np

#Create a DataFrame
d = { 'Name':['Alisa','Bobby','Cathrine','Madonna','Rocky',
'Sebastian','Jaqluine', 'Rahul','David','Andrew','Ajay','Teresa'],
'Score1':[62,47,55,74,31,77,85,63,42,32,71,57],
'Score2':[89,87,67,55,47,72,76,79,44,92,99,69]}
df = pd.DataFrame(d)
df
# mean of the dataframe
df.mean()
Output:
Score1 58.0
Score2 73.0
dtype: float64
Sorting :
from pandas import DataFrame
import pandas as pd
d = {'one':[2,3,1,4,5], 'two':[5,4,3,2,1], 'letter':['a','a','b','b','c']}
df = DataFrame(d)
test = df.sort_values(['one'], ascending=[False])

the output is:

letter one two
2 b 1 3
0 a 2 5
1 a 3 4
3 b 4 2
4 c 5 1
If ascending=False , data will be sorted in descending order.
Otherwise, by default the data will be sorted in ascending
order.
Groupby
Output:
employme name age employment_status state
name age nt_status state
0 Anush 23 emp pb
Anush 23 emp pb
Ankush 32 unemp pb
1 Ankush 32 unemp pb
Alisha 21 emp pb 2 Alisha 21 emp pb
Rohit 34 emp hp 3 Rohit 34 emp hp
Komal 26 unemp hr 4 Komal 26 unemp hr
Karthik 29 emp hr 5 Karthik 29 emp hr

name
state
import pandas as pd
import numpy as np hp 1
hr 2
pb 3
df1 =
pd.read_csv('datasets/stackdata
setexample.csv')
print(df1) pb 3
#print Hr 2
(df1.groupby(["state"])[['name']]. hp 1
count()) Name: state, dtype: int64
j=df1['state'].value_counts()
print(j)
Drop Duplicate and missing value
Duplicate data Missing data

A B C Aman CSE Python

Anu IT
foo 0A
Anuradha CSE PHP
foo 1A
Nisha BigData
foo 1B Pankaj CSE
bar 1A Ankit Java
foo 0A Rohit IT Android
Anu IT

import pandas as pd import pandas as pd

df = pd.read_csv('datasets\dropduplicatesexample.csv') #if we want to write 0 in those columns which have nan

print(df)
#df = pd.read_csv('datasets/dropnaexample.csv')
ee=df.drop_duplicates() df = pd.read_csv('datasets/dropnaexample.csv', header=None)
#print(ee) #check whole row for duplicacy print(df)

e=df.drop_duplicates(subset=['A', 'C']) df_drop_missing = df.dropna()

print(e) #drop rows which match on columns A and C #print(df_drop_missing)
e.to_csv("aaa.csv")
df_fill = df.fillna(1) #you can fill any number
print(df_fill)
Filters
name year salary
0 Aman 2017 40000
1 Raman 2017 24000 Output:
2 Anita 2017 31000
3 Kajal 2017 20000
4 Arun 2017 30000 Unnamed: 0 name year salary
5 Aman 2017 25000 0 0 Aman 2017 40000
1 1 Raman 2017 24000
2 2 Anita 2017 31000
3 3 Kajal 2017 20000
import pandas as pd 4 4 Arun 2017 30000
import numpy as np 5 5 Aman 2017 25000

df = pd.read_csv('datasets/filtersexample.csv') Unnamed: 0 name year salary

#print(df) 0 0 Aman 2017 40000
2 2 Anita 2017 31000
filtered = df.query('salary>30000') #salary greater than 30,000
#print(filtered)
Unnamed: 0 name year salary
df_filtered = df[(df.salary >= 30000) & (df.year == 2017)] 0 0 Aman 2017 40000
#print(df_filtered) 2 2 Anita 2017 31000
4 4 Arun 2017 30000
#print(df.salary.unique()) # list of unique items
#print(df.name.nunique()) #give the count of unque values
[40000 24000 31000 20000 30000 25000]

5
Joins
df_a df_b
subject_id first_name last_name first_nam
last_name
subject_id e
0 1 Ajay Anderson
0 4 Billy Bonder
1 2 Abhi Ackerman
1 5 Navi Black
2 3 Aman Ali
2 6 Swati Balwner
3 4 Avi Aoni
3 7 Shivali Brice
4 5 Aksh Atiches
4 8 Kamal Btisan

df_new
last_name
subject_id first_name

0 1 Ajay Anderson
1 2 Abhi Ackerman
2 3 Aman Ali
3 4 Avi Aoni
4 5 Aksh Atiches df_new = pd.concat([df_a, df_b])
0 4 Billy Bonder df_new
1 5 Navi Black
2 6 Swati Balwner
3 7 Shivali Brice
4 8 Kamal Btisan
pd.concat([df_a, df_b], axis=1)

subject_id first_name last_name subject_id first_name last_name

0 1 Ajay Anderson 4 Billy Bonder

1 2 Abhi Ackerman 5 Navi Black

2 3 Aman Ali 6 Swati Balwner

3 4 Avi Aoni 7 Shivali Brice

4 5 Aksh Atiches 8 Kamal Btisan

Merge with right join

pd.merge(df_a, df_b, on='subject_id', how='right')

first_name_x last_name_x first_name_y last_name_y

subject_id

0 4 Avi Aoni Billy Bonder

1 5 Aksh Atiches Navi Black
2 6 NaN NaN Swati Balwner
3 7 NaN NaN Shivali Brice
4 8 NaN NaN Kamal Btisan
Merge with left join
“Left outer join produces a complete set of records from Table A, with
the matching records (where available) in Table B. If there is no
match, the right side will contain null.”

pd.merge(df_a, df_b, on='subject_id', how='left')

subject_id first_name_x last_name_x first_name_y last_name_y

0 1 Ajay Anderson NaN NaN

1 2 Abhi Ackerman NaN NaN
2 3 Aman Ali NaN NaN
3 4 Avi Aoni Billy Bonder
4 5 Aksh Atiches Navi Black
Merge with inner join
“Inner join produces only the set of records
that match in both Table A and Table B.”

pd.merge(df_a, df_b, on='subject_id', how='inner')

first_name_x last_name_x first_name_y last_name_y

subject_id
0 4 Avi Aoni Billy Bonder
1 5 Aksh Atiches Navi Black
Merge with outer join
“Full outer join produces the set of all records in Table A and
Table B, with matching records from both sides where available.
If there is no match, the missing side will contain null.”

pd.merge(df_a, df_b, on='subject_id', how='outer')

subject_id first_name_x last_name_x first_name_y last_name_y

0 1 Ajay Anderson NaN NaN

1 2 Abhi Ackerman NaN NaN

2 3 Aman Ali NaN NaN

3 4 Avi Aoni Billy Bonder
4 5 Aksh Atiches Navi Black
5 6 NaN NaN Swati Balwner
6 7 NaN NaN Shivali Brice
7 8 NaN NaN Kamal Btisan

AUTOSAR Basic Training PDF
100% (5)
AUTOSAR Basic Training PDF
62 pages
Typecasting in Python
No ratings yet
Typecasting in Python
6 pages
PYTHON With NumPy and Pandas
100% (1)
PYTHON With NumPy and Pandas
6 pages
Introduction To OOPS in Python
No ratings yet
Introduction To OOPS in Python
6 pages
Pandas
100% (1)
Pandas
24 pages
Data Handing Using Pandas-I
100% (2)
Data Handing Using Pandas-I
46 pages
Worksheet - Pandas
100% (1)
Worksheet - Pandas
16 pages
Unit-3 Python
No ratings yet
Unit-3 Python
72 pages
Study Material IP XII
No ratings yet
Study Material IP XII
116 pages
Python Practical No.3 While Loop Programs
No ratings yet
Python Practical No.3 While Loop Programs
7 pages
File Handling Program List
No ratings yet
File Handling Program List
2 pages
SAMPLE PAPER-IX Class XII (Computer Science) SEE PDF
No ratings yet
SAMPLE PAPER-IX Class XII (Computer Science) SEE PDF
5 pages
Python Important
No ratings yet
Python Important
35 pages
File Handling Cs Class 12
No ratings yet
File Handling Cs Class 12
19 pages
Chapter 12: Interface Python With An SQL Database
100% (1)
Chapter 12: Interface Python With An SQL Database
4 pages
Python Main Program Set 2
No ratings yet
Python Main Program Set 2
18 pages
Python Notes
No ratings yet
Python Notes
11 pages
Solutions To Python by S Arora
58% (64)
Solutions To Python by S Arora
12 pages
Python Lab Programs - Chapter 2 To 4
No ratings yet
Python Lab Programs - Chapter 2 To 4
13 pages
OOP Notes CS2311 Notes
No ratings yet
OOP Notes CS2311 Notes
80 pages
Chapter 1 Review of Python Basicseng PDF
No ratings yet
Chapter 1 Review of Python Basicseng PDF
51 pages
Assignment Functions in Python
No ratings yet
Assignment Functions in Python
6 pages
03 Strings in Python
No ratings yet
03 Strings in Python
29 pages
Python Functions and Array - List - Set - Tuples Programs
No ratings yet
Python Functions and Array - List - Set - Tuples Programs
59 pages
Class Xi-Ip Practical List Python
No ratings yet
Class Xi-Ip Practical List Python
2 pages
Unit 1 Python
No ratings yet
Unit 1 Python
52 pages
Python Data Science 101
100% (1)
Python Data Science 101
41 pages
File Handling Assignment
No ratings yet
File Handling Assignment
1 page
XII-IP - Data Visualisation
No ratings yet
XII-IP - Data Visualisation
65 pages
Python Questions and Answers - Variable Names: Advertisement
No ratings yet
Python Questions and Answers - Variable Names: Advertisement
10 pages
Python Notes 3rd Mca
No ratings yet
Python Notes 3rd Mca
99 pages
1) Python Class and Objects: Creating Classes in Python
0% (1)
1) Python Class and Objects: Creating Classes in Python
16 pages
CT-1 - Paper (Python-BCC302) - Solution
No ratings yet
CT-1 - Paper (Python-BCC302) - Solution
12 pages
Python Practice Problems List
No ratings yet
Python Practice Problems List
4 pages
Function Practice Questions
No ratings yet
Function Practice Questions
27 pages
Python OOPs
No ratings yet
Python OOPs
25 pages
19.python OOPs Concepts
No ratings yet
19.python OOPs Concepts
28 pages
File Handling in Python
No ratings yet
File Handling in Python
17 pages
SAMPLE PAPER-II - Class XII (Computer Science) QP With MS BP
No ratings yet
SAMPLE PAPER-II - Class XII (Computer Science) QP With MS BP
9 pages
Informatics Practices Practical List22-2323
100% (1)
Informatics Practices Practical List22-2323
7 pages
101 Onwards On Python Pandas and Pyplot
No ratings yet
101 Onwards On Python Pandas and Pyplot
33 pages
Cs Practical File Final - Lakshay Aggarwal
No ratings yet
Cs Practical File Final - Lakshay Aggarwal
55 pages
Python Lab File
No ratings yet
Python Lab File
23 pages
DBMS Handwritten Notes-1
No ratings yet
DBMS Handwritten Notes-1
81 pages
Pyhthon Handwritten Notes
No ratings yet
Pyhthon Handwritten Notes
43 pages
IP Class-XI Chapter-9 NOTES
No ratings yet
IP Class-XI Chapter-9 NOTES
14 pages
Python U 4 One Shot Notes - 5fb3c31c 1918 47a2 Bf55 Feb50cb9a8c9
No ratings yet
Python U 4 One Shot Notes - 5fb3c31c 1918 47a2 Bf55 Feb50cb9a8c9
32 pages
Tokens in PYTHON
No ratings yet
Tokens in PYTHON
49 pages
Lecture10 Python OOP
No ratings yet
Lecture10 Python OOP
23 pages
Python Modules
No ratings yet
Python Modules
29 pages
12 CS QP
No ratings yet
12 CS QP
4 pages
Python Exception Handling
No ratings yet
Python Exception Handling
19 pages
File Handling Notes
No ratings yet
File Handling Notes
17 pages
QP_HLY_XII_2024_25_CS_ANS (1)
No ratings yet
QP_HLY_XII_2024_25_CS_ANS (1)
13 pages
Code With Harry
No ratings yet
Code With Harry
80 pages
LMRS Ip 2020 21
No ratings yet
LMRS Ip 2020 21
21 pages
Unit: 4 OOP and File Handling: Prepared By: Ms. Parveen Mor Dahiya
No ratings yet
Unit: 4 OOP and File Handling: Prepared By: Ms. Parveen Mor Dahiya
48 pages
Data Frame Demo
No ratings yet
Data Frame Demo
73 pages
EXP-3
No ratings yet
EXP-3
10 pages
a5
No ratings yet
a5
28 pages
7 Days Analytics Course 3feiz7 4
No ratings yet
7 Days Analytics Course 3feiz7 4
8 pages
MSBTE AJP_ALL_MCQs scheme I
No ratings yet
MSBTE AJP_ALL_MCQs scheme I
50 pages
Error Event Message Guide b
No ratings yet
Error Event Message Guide b
3,770 pages
Information Security Advisor PDF
No ratings yet
Information Security Advisor PDF
3 pages
How To Publish Cucumber Reports
No ratings yet
How To Publish Cucumber Reports
8 pages
Best Practices For Computer Forensic Acquisitions
No ratings yet
Best Practices For Computer Forensic Acquisitions
15 pages
Senior Oracle DBA 1571763757
No ratings yet
Senior Oracle DBA 1571763757
2 pages
PROCESS SYNCHRONIZATION Dsatm
No ratings yet
PROCESS SYNCHRONIZATION Dsatm
6 pages
411: Change To Dell Digital Locker (DDL) For Storage Center'S Scos Software On Scv2000 Series
No ratings yet
411: Change To Dell Digital Locker (DDL) For Storage Center'S Scos Software On Scv2000 Series
2 pages
MS Word Basics
No ratings yet
MS Word Basics
21 pages
DBMS Project SRS
No ratings yet
DBMS Project SRS
21 pages
Android Mobile Pentest 101: © Tsug0d, September 2018
No ratings yet
Android Mobile Pentest 101: © Tsug0d, September 2018
31 pages
Barry Lee 9.14.2021
No ratings yet
Barry Lee 9.14.2021
1 page
FCC 2228 Lesson 2-4_250303_201841
No ratings yet
FCC 2228 Lesson 2-4_250303_201841
5 pages
JAP205 Presentation 3
No ratings yet
JAP205 Presentation 3
8 pages
Document Editor Mini - Project - Final Reportdoc@satya
No ratings yet
Document Editor Mini - Project - Final Reportdoc@satya
34 pages
Python Project Music System
100% (1)
Python Project Music System
34 pages
Recent Advances in Cost Auditing & Cost System Uttu
No ratings yet
Recent Advances in Cost Auditing & Cost System Uttu
13 pages
Testing Assignment 2
No ratings yet
Testing Assignment 2
7 pages
Y1S2 2023 Even Insem2 Qp's
No ratings yet
Y1S2 2023 Even Insem2 Qp's
9 pages
An In-Depth Analysis of Iot Security Requirements, Challenges, and Their Countermeasures Via Software-Defined Security
No ratings yet
An In-Depth Analysis of Iot Security Requirements, Challenges, and Their Countermeasures Via Software-Defined Security
27 pages
Unit 4 - Computer System Organisation - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Computer System Organisation - WWW - Rgpvnotes.in
8 pages
Action Shortcut-WPS Office
No ratings yet
Action Shortcut-WPS Office
4 pages
CIT Aishwarya AI&DS
No ratings yet
CIT Aishwarya AI&DS
1 page
Web Design
No ratings yet
Web Design
16 pages
Mcafee Epolicy Orchestrator 5.10.0 Product Guide 1-31-2023
No ratings yet
Mcafee Epolicy Orchestrator 5.10.0 Product Guide 1-31-2023
483 pages
Bcom 1st Sem FIT Lab Record Single File Printing
No ratings yet
Bcom 1st Sem FIT Lab Record Single File Printing
35 pages
wcp-2401011059-0843970059-2
No ratings yet
wcp-2401011059-0843970059-2
3 pages
WDD 231 Web fronted development 1
No ratings yet
WDD 231 Web fronted development 1
135 pages
Oracle DBA Course Syllabus 1
No ratings yet
Oracle DBA Course Syllabus 1
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

12 Pandas

Uploaded by

12 Pandas

Uploaded by

PANDAS

For Python programming language the most

⦿ For 2D data such object is

Features of Data Frame

pandas.DataFrame( data, index, columns, dtype, copy)

#import the pandas library and aliasing as pd

Its output is as follows −

df = pd.DataFrame(data = BabyDataSet, columns=['Names', 'Births'])

[('Bob', 968), ('Jessica', 155), ('Mary', 77), ('John', 578), ('Mel',

# Creating the dataframe

# skip the Na values while finding the maximum

Max() is used to find the maximum value .

mean() – Mean Function in python pandas is used to calculate the

the output is:

A B C Aman CSE Python

import pandas as pd import pandas as pd

df = pd.read_csv('datasets\dropduplicatesexample.csv') #if we want to write 0 in those columns which have nan

e=df.drop_duplicates(subset=['A', 'C']) df_drop_missing = df.dropna()

df = pd.read_csv('datasets/filtersexample.csv') Unnamed: 0 name year salary

subject_id first_name last_name subject_id first_name last_name

0 1 Ajay Anderson 4 Billy Bonder

1 2 Abhi Ackerman 5 Navi Black

2 3 Aman Ali 6 Swati Balwner

4 5 Aksh Atiches 8 Kamal Btisan

pd.merge(df_a, df_b, on='subject_id', how='right')

first_name_x last_name_x first_name_y last_name_y

0 4 Avi Aoni Billy Bonder

pd.merge(df_a, df_b, on='subject_id', how='left')

subject_id first_name_x last_name_x first_name_y last_name_y

0 1 Ajay Anderson NaN NaN

pd.merge(df_a, df_b, on='subject_id', how='inner')

first_name_x last_name_x first_name_y last_name_y

pd.merge(df_a, df_b, on='subject_id', how='outer')

subject_id first_name_x last_name_x first_name_y last_name_y

0 1 Ajay Anderson NaN NaN

1 2 Abhi Ackerman NaN NaN

2 3 Aman Ali NaN NaN

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.