0% found this document useful (0 votes)

117 views9 pages

Content Pandas Cheat Sheet

This document provides an overview of key pandas functionality for reading, writing, manipulating, and analyzing data in Python. It covers topics such as installing pandas, importing data from files, creating and accessing DataFrames and Series, descriptive statistics, filtering and grouping data, joining DataFrames, and cleaning data. The document serves as a helpful introduction and reference guide for common pandas operations.

Uploaded by

Turya Ganguly

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

117 views9 pages

Content Pandas Cheat Sheet

Uploaded by

Turya Ganguly

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

1.

Installation and Importing

Installing pip install pandas

Importing Convention import pandas as pd

2. Reading and Writing data

Reading data df = pd.read_csv('filename.csv')

# can extend for json, excel

types too using
pd.read_json/pd.read_excel, etc.

Writing data df.to_csv('filename.csv')

# can extend for json, excel and

such too using
df.to_json/df.to_excel, etc.

3. Series and Dataframes

a. Creating a series
pd.Series([‘a’, ‘b’, ‘c’])
b. Creating a dataframe

Row pd.DataFrame([['a', 1], ['b', 2]],

Oriented columns=['name', 'id'])

Column pd.DataFrame({'name':['a', 'b'], 'id':[1,

Oriented 2]})

4. Info extraction

Shape (Return a tuple representing the df.shape

dimensionality of the DataFrame.)
# e.g.-(2,3) for
2 rows and 3
columns

Head (first n rows, default 5) df.head(n)

Tail (last n rows, default 5) df.tail(n)

Info (return info of all columns) df.info()

Describe (gives statistical information of df.describe()

data)

5. Accessing and Indexing

a. Direct accessing columns and rows, as well as both together

Accessing a row df.loc[ei]

# ei here is explicit index
df.iloc[ii]
# ii here is implicit index

Accessing a column df['column_name']

# for single column

df[['col1', 'col2']]
# for multiple columns

b. Slicing

Row df.loc[1:3]
# 1 and 3 are the explicit
indices here

df.iloc[2:4]
# 2 and 4 are the implicit
indices here

Column df.loc[:, 'a':'b']

Both rows and columns df.loc[1:3, 'a':'b']
# 1 and 3 are explicit
indices here

c. Feature exploration (masking, filtering)

Masking df['col']>value
Creates a mask based on our # E.g.
required condition df['age'] > 30

Filtering df.loc[(df['col1'] ==
Filters data based on conditions val1) & (df['col2']
==val2)]

# E.g
df.loc[(df['month'] ==
'January') &
(df['year']=='2022')]
# filters out data for
january 2022

6. Dataframe Manipulation
a. Adding a new row/column

Row df.loc[(df['col1'] == val1) &

(df['col2'] ==val2)]

# E.g:.
df.loc[len(df.index)] = ['a', 1]

# this will add a row at the end

of the dataframe

Column df['new_col']=data
b. Deleting a new row/column

Row df.drop(labels=None, axis=0)

# E.g.
df.drop(3, axis=0)
# Here 3 is the explicit index,
axis=0 is for row

Column df.drop('col_name', axis=1)

c. Renaming a column

Column df.rename({'old_name':'new_name',
axis=1})

Row df.index=new_indices

d. Duplicates and dropping duplicates

i. Find duplicate rows
df.duplicated(subset=None, keep='first')
# subset can be used to specify certain column(s) for
identifying the duplicates
# keep determines which duplicates to mark
first : Mark duplicates as True except for the first
occurrence.
last : Mark duplicates as True except for the last
occurrence.
False : Mark all duplicates as True.

# Returns a boolean series for each duplicate row marked

as True

ii. Drop duplicate values

df.drop_dupicates(subset=None, keep=’first’)

# Parameters have the same meaning as in df.duplicated, except

here it will drop the rows marked duplicate
7. Operations
a. Sorting
df.sort_values([‘col1’], ascending=[True])
b. Built in ops
● Built in ops such as mean, min, max, etc.
● E.g., df[‘col1’].min(), df[‘col1’].count(), etc.
c. Apply
Applies a function along one of the axis of the dataframe
df[‘col’].apply(function)

E.g.
data[['revenue', 'budget']].apply(np.sum, axis=1)
#sums values of revenue and budget across each row
8. Joins

9.
a. Concat
pd.concat([df1, df2], axis = 0] (for concatenating horizontally, change axis
= 1)
b. Merge
df1.merge(df2, on=’foreign_key’, how=’type_of_join’)
● Optional -> left_on and right_on
● Eg. df1.merge(df2, on=’id’, how=’inner’)
10. Groupby

Grouping based on a single df.groupby(‘group_col_name’)[‘col(s)’].

aggregate aggregate_function()

E.g.
df.groupby(‘director_name’)[‘title’].cou
nt()
# Finds number of titles per director

Grouping based on multiple df.groupby([‘group_col_name’])[‘col’].a

aggregates ggregate([‘func1’, ‘func2’])

E.g.
df.groupby(['director_name'])["year"].a
ggregate(['min', 'max'])
# Finds first and recent year of movies
made by all directors

Group based filtering df.groupby(‘group_col_name’).filter(bo

olean array based oncondition)

E.g.
data.groupby('director_name').filter(la
mbda x: x["budget"].max() >= 100)

# This filters all rows of those directors

whose maximum budget is greater
than 100 million)

Group based apply df.groupby(‘group_col_name’).apply(f

unction)

E.g.
def func(x):
x["risky"] = x["budget"] -
x["revenue"].mean() >= 0
return x
data_risky =
data.groupby("director_name").apply(f
unc)

# Finds movies whose budget is

higher than its director’s average
revenue

11. Cleaning our data

a. None and nan
● “NaN” is for columns with numbers as their values
● “None” is for columns with non-number entries(e.g. String, object
type, etc.)
● Can check for null values using “isna()”
○ E.g. df.isna() # returns the dataframe with True/False for
null values in the respective element’s position
○ df.isna().sum() # returns number of null values per column.
Can modify with df.isna().sum(axis=1) for each row’s null
count
○df.isna().sum().sum() # returns total number of null values
b. Filling null values
df.fillna(n) # fills null values with value ‘n’
c. Dropping null values
df.dropna(axis = 0)
# Default axis=0, use 1 for columns
# Drops rows/columns with even a single missing value

12. Data Restructuring

Melt pd.melt(df, id_vars=[‘list of

Convert dataframe from wide to columns’]
long format
E.g.
pd.melt(data, id_vars=['Date',
'Parameter', 'Drug_Name'])
# This will melt all the columns
except the ones mentioned inside
id_vars list

Pivot df.pivot(index=[‘list of columns],

Opposite of melt, converts columns=’col_name’,
dataframe from long to wide values=’col_name’)
format
Outputs a multi-index dataframe
E.g.
data_melt.pivot(index=['Date','Dru
g_Name','Parameter'], columns =
'time', values='reading')

# This will keep the index columns

mentioned as constant, while
making new columns from the
“time” column, whose values will
be the ones in the “reading”
column.

Cut df[‘new_cat_column’]=pd.cut(df[‘co
Bins continuous data into ntinous_col’],bins=bin_values,
categorical groups labels=label_values)
E.g.
data_tidy['temp_cat'] =
pd.cut(data_tidy['Temperature'],
bins=temp_points,
labels=temp_labels)

# This will bin the temperature

column into the respective bins,
and will label the bins as per
temp_labels

Shift df[‘col’].shift(periods=n, axis=0)

Shifts the values of rows/columns
E.g.
df["Marks"].shift(periods = 1, axis
= 0)
# This shifts the values of the
Marks col by one, so basically the
value of first row will be NaN,
second row will be the one of first
row, and so on.

13. Misc Topics

a. Datetime
i. Convert to Datetime object: pd.to_datetime(df[‘col’])
ii. Extracting Information

df[‘col’][0].year Extracts the year for the 0th index

value
Here 0 is the implicit index
Use .month and .day for the
respective data

df[‘col’].dt.year Extracts the year for whole

columns (all the datetime values)

df[‘col’][0].strftime(‘%M%Y’) Formats the select data (0th index

datetime value here) into the
required data time format (month
and year in this case)

b. String functions
We can use .str to apply string functions to any column
df[‘col’].str.function()

E.g.
i. data_tidy['Date'].str.split('-')
# This will split the “Date” column into elements separated by “-”
ii. data_tidy.loc[data_tidy['Drug_Name'].str.contains('hydrochloride')]
# Will filter out rows containing the string “hydrochloride”

Data Cleaning - Cheatsheet
100% (2)
Data Cleaning - Cheatsheet
8 pages
System Software and Computing Concepts CT123-3-1
No ratings yet
System Software and Computing Concepts CT123-3-1
17 pages
Pandas Cheat Sheet PDF
67% (3)
Pandas Cheat Sheet PDF
1 page
Introduction to Pandas Programming 2
No ratings yet
Introduction to Pandas Programming 2
3 pages
DAP_3_module
No ratings yet
DAP_3_module
62 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
5 pages
Pandas Notes
No ratings yet
Pandas Notes
3 pages
Pandas Cheat Sheet Final
No ratings yet
Pandas Cheat Sheet Final
1 page
Dataframe in Pandas - Cheatsheet
No ratings yet
Dataframe in Pandas - Cheatsheet
8 pages
Data Analysis With Python
No ratings yet
Data Analysis With Python
60 pages
Introduction To Pandas in Data Analytics
No ratings yet
Introduction To Pandas in Data Analytics
12 pages
Important Pandas Operations 1697910759
No ratings yet
Important Pandas Operations 1697910759
6 pages
Pandas_Tutorial
No ratings yet
Pandas_Tutorial
9 pages
data handling module
No ratings yet
data handling module
10 pages
Pandas Cheat Sheet
100% (2)
Pandas Cheat Sheet
6 pages
Pandas
No ratings yet
Pandas
13 pages
python 2.1.3 (2)
No ratings yet
python 2.1.3 (2)
6 pages
Pandas
No ratings yet
Pandas
5 pages
Pandas
No ratings yet
Pandas
94 pages
Data Wrangling With Python and Pandas
No ratings yet
Data Wrangling With Python and Pandas
7 pages
python interviews
No ratings yet
python interviews
154 pages
Pandas_Tutorial
No ratings yet
Pandas_Tutorial
7 pages
What is pandas
No ratings yet
What is pandas
9 pages
Fundamental - Python
No ratings yet
Fundamental - Python
3 pages
Pandas
No ratings yet
Pandas
13 pages
Commands SQL, Python (BASICS)
No ratings yet
Commands SQL, Python (BASICS)
7 pages
Pandas Cheat Sheet - Python For Data Science
No ratings yet
Pandas Cheat Sheet - Python For Data Science
5 pages
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
No ratings yet
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
8 pages
Python Cheat Sheet Code Academy
100% (1)
Python Cheat Sheet Code Academy
1 page
Data Science Cheat Sheet: KEY Imports
100% (1)
Data Science Cheat Sheet: KEY Imports
1 page
99c949c0-5910-425f-9ac5-155882800fa5
No ratings yet
99c949c0-5910-425f-9ac5-155882800fa5
36 pages
Pandas
No ratings yet
Pandas
4 pages
Pandas Roadmap
No ratings yet
Pandas Roadmap
6 pages
Pandas Cheat Sheet
100% (1)
Pandas Cheat Sheet
2 pages
Pandas
No ratings yet
Pandas
26 pages
Chapter 2 Python Pandas - II
No ratings yet
Chapter 2 Python Pandas - II
19 pages
04-Data Manipulation With Pandas
No ratings yet
04-Data Manipulation With Pandas
28 pages
pandas (1)
No ratings yet
pandas (1)
25 pages
Pandas
No ratings yet
Pandas
25 pages
Exploratory Data Analysis (Eda) With Pandas: (Cheatsheet)
No ratings yet
Exploratory Data Analysis (Eda) With Pandas: (Cheatsheet)
7 pages
pandas_merged
No ratings yet
pandas_merged
2 pages
exp3 python (1)
No ratings yet
exp3 python (1)
15 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Pandas CheatSheet
No ratings yet
Pandas CheatSheet
18 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
Pandas
No ratings yet
Pandas
9 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
pandas_notes
No ratings yet
pandas_notes
8 pages
Pandas Notes
No ratings yet
Pandas Notes
4 pages
Python Libraries Cheat Sheets
No ratings yet
Python Libraries Cheat Sheets
6 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
Lecture 14
No ratings yet
Lecture 14
33 pages
Pandas Data Structures: Sections
No ratings yet
Pandas Data Structures: Sections
13 pages
Pandas_Notes_Design
No ratings yet
Pandas_Notes_Design
5 pages
Chapter-2 Python Pandas
100% (2)
Chapter-2 Python Pandas
33 pages
Different Methods of Plotting
No ratings yet
Different Methods of Plotting
4 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
UM08022 Flasher
No ratings yet
UM08022 Flasher
4 pages
CLASS 12 - COMPUTER SCIENCE Cbse Sample Paper
No ratings yet
CLASS 12 - COMPUTER SCIENCE Cbse Sample Paper
9 pages
Incident Response Playbooks AND Workflows
No ratings yet
Incident Response Playbooks AND Workflows
62 pages
Week 1 - Assembly Language and Computer Programming + Number System
No ratings yet
Week 1 - Assembly Language and Computer Programming + Number System
36 pages
MLA LabManual1
No ratings yet
MLA LabManual1
52 pages
B66062
No ratings yet
B66062
462 pages
Problem Statements C programming
No ratings yet
Problem Statements C programming
26 pages
CS3492 Database Management Systems Two Mark Questions 1
100% (1)
CS3492 Database Management Systems Two Mark Questions 1
38 pages
4.8.6 Two-Step Stock Transfer and Subcontracting Cockpit
No ratings yet
4.8.6 Two-Step Stock Transfer and Subcontracting Cockpit
1 page
cyber security enginner
No ratings yet
cyber security enginner
3 pages
Bike Rental System
No ratings yet
Bike Rental System
8 pages
InvisaWear-Revolutionizing-Personal-Safety Group 5
No ratings yet
InvisaWear-Revolutionizing-Personal-Safety Group 5
5 pages
Iag BR P Iag-Brochure 20221114
No ratings yet
Iag BR P Iag-Brochure 20221114
18 pages
You Attitude in Business Writing Exercise #1: Not This
No ratings yet
You Attitude in Business Writing Exercise #1: Not This
4 pages
Geolog6.6.1 Artist Tutorial
No ratings yet
Geolog6.6.1 Artist Tutorial
61 pages
AD
No ratings yet
AD
3 pages
COMP 212 - Week 15 Lecture
No ratings yet
COMP 212 - Week 15 Lecture
10 pages
PHP Array
No ratings yet
PHP Array
10 pages
Gaddis Python 4e Chapter 04
No ratings yet
Gaddis Python 4e Chapter 04
13 pages
Apt Mmf 一种高级持续性威胁行动者归因方法
No ratings yet
Apt Mmf 一种高级持续性威胁行动者归因方法
27 pages
Assignment 3)
No ratings yet
Assignment 3)
5 pages
Rocket JDBC Connection
No ratings yet
Rocket JDBC Connection
102 pages
Harmonic Smi
No ratings yet
Harmonic Smi
3 pages
Impresiones 3D
No ratings yet
Impresiones 3D
9 pages
Lavanya Mundru Resume
No ratings yet
Lavanya Mundru Resume
2 pages
Distribution: DIASPORA Databases Urs Diaspora & Mbo, 2010 V1
No ratings yet
Distribution: DIASPORA Databases Urs Diaspora & Mbo, 2010 V1
8 pages
NXP Java Card p5cc080
No ratings yet
NXP Java Card p5cc080
18 pages
Data Acquisition
No ratings yet
Data Acquisition
4 pages
Questio S Answers of DBM S
No ratings yet
Questio S Answers of DBM S
13 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Content Pandas Cheat Sheet

Uploaded by

Content Pandas Cheat Sheet

Uploaded by

1.

Installation and Importing

Installing pip install pandas

Importing Convention import pandas as pd

2. Reading and Writing data

Reading data df = pd.read_csv('filename.csv')

# can extend for json, excel

Writing data df.to_csv('filename.csv')

# can extend for json, excel and

3. Series and Dataframes

Row pd.DataFrame([['a', 1], ['b', 2]],

Column pd.DataFrame({'name':['a', 'b'], 'id':[1,

Shape (Return a tuple representing the df.shape

Head (first n rows, default 5) df.head(n)

Tail (last n rows, default 5) df.tail(n)

Info (return info of all columns) df.info()

Describe (gives statistical information of df.describe()

5. Accessing and Indexing

Accessing a row df.loc[ei]

Accessing a column df['column_name']

Column df.loc[:, 'a':'b']

c. Feature exploration (masking, filtering)

Row df.loc[(df['col1'] == val1) &

# this will add a row at the end

Row df.drop(labels=None, axis=0)

Column df.drop('col_name', axis=1)

d. Duplicates and dropping duplicates

# Returns a boolean series for each duplicate row marked

ii. Drop duplicate values

# Parameters have the same meaning as in df.duplicated, except

Grouping based on a single df.groupby(‘group_col_name’)[‘col(s)’].

Grouping based on multiple df.groupby([‘group_col_name’])[‘col’].a

Group based filtering df.groupby(‘group_col_name’).filter(bo

# This filters all rows of those directors

Group based apply df.groupby(‘group_col_name’).apply(f

# Finds movies whose budget is

11. Cleaning our data

12. Data Restructuring

Melt pd.melt(df, id_vars=[‘list of

Pivot df.pivot(index=[‘list of columns],

# This will keep the index columns

# This will bin the temperature

Shift df[‘col’].shift(periods=n, axis=0)

13. Misc Topics

df[‘col’][0].year Extracts the year for the 0th index

df[‘col’].dt.year Extracts the year for whole

df[‘col’][0].strftime(‘%M%Y’) Formats the select data (0th index

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.