100% found this document useful (2 votes)

226 views6 pages

Pandas Cheat Sheet

The document discusses pandas, a Python library for data analysis and manipulation. It provides a cheat sheet of pandas syntax and methods for working with DataFrames. Key points covered include: - Creating and manipulating DataFrames - Reshaping data through operations like melt, pivot, and concatenation - Filtering and subsetting DataFrames - Grouping and aggregating data - Handling missing data - Visualizing data through plotting methods

Uploaded by

shan halder

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

226 views6 pages

Pandas Cheat Sheet

Uploaded by

shan halder

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Data Wrangling

with pandas M A Cheat Sheet

http://pandas.pydata.org
Syntax – Creating DataFrames
Tidy Data – A foundation for wrangling in pandas
In a tidy data set:
FMA

&
Each variable is saved in its own column

Tidy data complements pandas’s vectorized

operations. pandas will
automatically preserve observations as you manipulate variables. No other
format works as intuitively with pandas.

Each observation is saved in its own row Reshaping

Data – Change the layout of a data set
abc
1 4 7 10
2 5 8 11
3 6 9 12 df
= pd.DataFrame(
{"a" : [4 ,5, 6], "b" : [7, 8, 9], "c" : [10, 11, 12]}, index = [1, 2, 3]) Specify values for
each column.
df = pd.DataFrame(
[[4, 7, 10], [5, 8, 11], [6, 9, 12]], index=[1, 2, 3], columns=['a', 'b', 'c']) Specify values
for each row.
abc
nv
1 4 7 10
d 2 5 8 11

e 2 6 9 12 df
= pd.DataFrame(
{"a" : [4 ,5, 6], "b" : [7, 8, 9], "c" : [10, 11, 12]}, index = pd.MultiIndex.from_tuples(
[('d',1),('d',2),('e',2)],
names=['n','v']))) Create DataFrame with a MultiIndex

Method Chaining
Most pandas methods return a DataFrame so that another pandas method can be
applied to the result. This improves readability of code. df = (pd.melt(df)
.rename(columns={
'variable' : 'var', 'value' : 'val'}) .query('val >= 200') )
df[['width','length','species']] df[df.Length > 7]
Extract rows that meet logical criteria. df.drop_duplicates()
Remove duplicate rows (only considers columns).
df.sample(frac=0.5)
Randomly select fraction of rows. df.sample(n=10)
Randomly select n rows. df.iloc[10:20]
Select rows by position.
Select multiple columns with specific names. df['width'] or df.width
Select single column with specific name. df.filter(regex='regex')
Select columns whose name matches regular expression regex.
df.head(n)
df.nlargest(n, 'value') Select first n rows.
Select and order top n entries. df.tail(n)
df.nsmallest(n, 'value') Select last n rows.
Select and order bottom n entries.
Logic in Python (and pandas)
< Less than != Not equal to
df.loc[:,'x2':'x4'] > Greater than df.column.isin(values) Group membership
Select all columns between x2 and x4 (inclusive).
== Equals pd.isnull(obj) Is NaN
df.iloc[:,[1,2,5]]
<= Less than or equals pd.notnull(obj) Is not NaN
>= Greater than or equals &,|,~,^,df.any(),df.all() Logical and, or, not, xor, any, all
regex (Regular Expressions) Examples
'\.' Matches strings containing a period '.'
'Length$' Matches strings ending with word 'Length'
'^Sepal' Matches strings beginning with the word 'Sepal'
'^x[1-5]$' Matches strings beginning with 'x' and ending with 1,2,3,4,5
''^(?!Species$).*' Matches strings except the string 'Species'
Select columns in positions 1, 2 and 5 (first column is 0). df.loc[df['a'] > 10, ['a','c']]
Select rows meeting logical condition, and only the specific columns . http://pandas.pydata.org/
This cheat sheet inspired by Rstudio Data Wrangling Cheatsheet
(https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf) Written by Irv Lustig, Princeton Consultants

M A pd.melt(df) Gather columns into rows.

M
* AF
*
df.pivot(columns='var', values='val')

Spread rows into columns.

pd.concat([df1,df2])
Append rows of DataFrames
df.sort_values('mpg')
Order rows by values of a column (low to high).
df.sort_values('mpg',ascending=False) Order rows by values of a column (high to
low).
df.rename(columns = {'y':'year'})
Rename the columns of a DataFrame
df.sort_index()
Sort the index of a DataFrame
df.reset_index()
Reset index of DataFrame to row numbers, moving index to columns.
pd.concat([df1,df2], axis=1)
df.drop(columns=['Length','Height']) Append columns of DataFrames
Drop columns from DataFrame

Subset Observations (Rows)

Subset Variables (Columns)
http://pandas.pydata.org/ This cheat sheet inspired by Rstudio Data Wrangling Cheatsheet
(https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf) Written by Irv Lustig, Princeton Consultants
Summarize Data
Make New Columns
Combine Data Sets
df['w'].value_counts()
Count number of rows with each unique value of variable len(df)
# of rows in DataFrame. df['w'].nunique()
# of distinct values in a column. df.describe()
Basic descriptive statistics for each column (or GroupBy)
pandas provides a large set of summary functions that operate on different kinds of
pandas objects (DataFrame columns, Series, GroupBy, Expanding and Rolling (see
below)) and produce single values for each of the groups. When applied to a
DataFrame, the result is returned as a pandas Series for each column. Examples:
sum()
Sum values of each object. count()
Count non-NA/null values of each object. median()
Median value of each object. quantile([0.25,0.75]) Quantiles of each object.
apply(function)
Apply function to each object.

Handling Missing Data

df.dropna()
Drop rows with any column having NA/null data. df.fillna(value)
Replace all NA/null data with value.

Plotting
df.plot.hist()
Histogram for each column
adf bdf x1 x2 A 1 B 2 C 3 Standard Joins
x1 x2 x3 A 1 T B 2 F C 3 NaN
x1 x2 x3 A 1.0 T B 2.0 F D NaN T
x1 x2 x3 A 1 T B 2 F
x1 x2 x3 A 1 T B 2 F C 3 NaN D NaN T
x1 x3 A T B F D T
pd.merge(adf, bdf,
how='left', on='x1') Join matching rows from bdf to adf.
df.assign(Area=lambda df: df.Length*df.Height)
Compute and append one or more new columns.
pd.merge(adf, bdf, df['Volume'] = df.Length*df.Height*df.Depth
how='right', on='x1') Add single column.
Join matching rows from adf to bdf. pd.qcut(df.col, n, labels=False)
Bin column into n buckets. min()
Minimum value in each object. max()
pd.merge(adf, bdf,
how='inner', on='x1') Vector function
Join data. Retain only rows in both sets. Maximum value in each object. mean()
Mean value of each object. var()
Vector function
pd.merge(adf, bdf, pandas provides a large set of vector functions that operate on all
how='outer', on='x1') columns of a DataFrame or a single selected column (a pandas
Join data. Retain all values, all rows. Variance of each object. std()
Series). These functions produce vectors of values for each of the columns, or a single
Series for the individual Series. Examples: Standard deviation of each
Filtering Joins object.
x1 x2 A 1 B 2
x1 x2 C 3
shift(1)
Copy with values shifted by 1. rank(method='dense')
Ranks with no gaps. rank(method='min')
Ranks. Ties get min rank. rank(pct=True)
Ranks rescaled to interval [0, 1]. rank(method='first')
Ranks. Ties go to first value.
min(axis=1)
Element-wise min. abs()
Absolute value.
The examples below can also be applied to groups. In this case, the function is applied
on a per-group basis, and the returned vectors are of the length of the original
DataFrame.

Windows
df.expanding()
Return an Expanding object allowing summary functions to be applied cumulatively.
df.rolling(n)
Return a Rolling object allowing summary functions to be applied to windows of length
n.
max(axis=1)
Element-wise max. clip(lower=-10,upper=10) Trim values at input thresholds

adf[adf.x1.isin(bdf.x1)] Group Data

All rows in adf that have a match in bdf.
df.groupby(by="col")
adf[~adf.x1.isin(bdf.x1)] Return a GroupBy object,
All rows in adf that do not have a match in bdf. grouped by values in column named
"col".
df.groupby(level="ind")
Return a GroupBy object, grouped by values in index level named "ind".
x1 x2 A 1 B 2 C 3
All of the summary functions listed above can be applied to a group. Additional GroupBy
functions:
shift(-1)
ydf zdf Copy with values lagged by 1. cumsum()
Cumulative sum. cummax()
Cumulative max. cummin()
Cumulative min. cumprod()
Cumulative product.
Set-like Operations x1 x2 B 2 C 3
x1 x2 A 1 B 2 C 3 D 4
x1 x2 A 1
x1 x2 B 2 C 3 D 4
pd.merge(ydf, zdf) size()
agg(function)
Rows that appear in both ydf and zdf Size of each group.
Aggregate group using function.
(Intersection).
pd.merge(ydf, zdf, how='outer')
Rows that appear in either or both ydf and zdf
df.plot.scatter(x='w',y='h')
(Union).
Scatter chart using pairs of points
pd.merge(ydf, zdf, how='outer',
indicator=True) .query('_merge == "left_only"') .drop(columns=['_merge'])
Rows that appear in ydf but not zdf (Setdiff).

Python Module-3 Notes (21EC646)_final
No ratings yet
Python Module-3 Notes (21EC646)_final
37 pages
Comparison of Different SQL Implementations
No ratings yet
Comparison of Different SQL Implementations
25 pages
Unit Vapplications Notes
No ratings yet
Unit Vapplications Notes
13 pages
MOD 4
No ratings yet
MOD 4
63 pages
Quanteda
No ratings yet
Quanteda
106 pages
R18B.tech.CSE(CyberSecurity)IIIIVYearTentativeSyllabuswithVulnerabilityAssessmentPenetrationTestingLab
No ratings yet
R18B.tech.CSE(CyberSecurity)IIIIVYearTentativeSyllabuswithVulnerabilityAssessmentPenetrationTestingLab
70 pages
Module 1
No ratings yet
Module 1
26 pages
Assignment One ITPP5112
No ratings yet
Assignment One ITPP5112
12 pages
Applications Development & Emerging Technologies
No ratings yet
Applications Development & Emerging Technologies
56 pages
Flat Pyq
No ratings yet
Flat Pyq
6 pages
CQP Query Language Tutorial
No ratings yet
CQP Query Language Tutorial
49 pages
CSI 411 - Compiler - Lecture 3 PDF
No ratings yet
CSI 411 - Compiler - Lecture 3 PDF
16 pages
At 1 & 2 Unit Questions
No ratings yet
At 1 & 2 Unit Questions
3 pages
Regular Expressions: Item 15: Know The Precedence of Regular Expression Operators
No ratings yet
Regular Expressions: Item 15: Know The Precedence of Regular Expression Operators
36 pages
Week-2 Lecture 2 Lexical Analysis
No ratings yet
Week-2 Lecture 2 Lexical Analysis
15 pages
Context Sensitive Earley
No ratings yet
Context Sensitive Earley
18 pages
Python Cheat Sheet For Data Analysis
No ratings yet
Python Cheat Sheet For Data Analysis
2 pages
Recap Lecture 5: Different Notations of Transition Diagrams, Languages of Strings of Even Length, Odd
No ratings yet
Recap Lecture 5: Different Notations of Transition Diagrams, Languages of Strings of Even Length, Odd
23 pages
Lab Sheet 6
No ratings yet
Lab Sheet 6
6 pages
FREESWITCH RegularExpression 220320 1408 38782
No ratings yet
FREESWITCH RegularExpression 220320 1408 38782
5 pages
LPI101 - LPIC-1 Exam Prep (Course 1)
No ratings yet
LPI101 - LPIC-1 Exam Prep (Course 1)
9 pages
Thera Bank - Project
100% (4)
Thera Bank - Project
34 pages
Pandas: Powerful Python Data Analysis Toolkit: Release 0.10.0
No ratings yet
Pandas: Powerful Python Data Analysis Toolkit: Release 0.10.0
432 pages
EDA with Pandas
No ratings yet
EDA with Pandas
8 pages
Pandas - Basics - Practice: Consider The Following Python Dictionary Data and Python List Labels
No ratings yet
Pandas - Basics - Practice: Consider The Following Python Dictionary Data and Python List Labels
6 pages
Par4all User Guide
No ratings yet
Par4all User Guide
14 pages
Palindrome
No ratings yet
Palindrome
3 pages
Java Regex - Pattern (Java - Util.regex - Pattern) - PDF
No ratings yet
Java Regex - Pattern (Java - Util.regex - Pattern) - PDF
3 pages
Scraping Book
No ratings yet
Scraping Book
50 pages
SQL Functions
100% (1)
SQL Functions
16 pages
SourceQL Paper 3: Progress Report 1 For EECS 395 (Senior Project)
No ratings yet
SourceQL Paper 3: Progress Report 1 For EECS 395 (Senior Project)
8 pages
Python Interview Questions With Answers
No ratings yet
Python Interview Questions With Answers
32 pages
Pandas Cheat Sheet PDF
67% (3)
Pandas Cheat Sheet PDF
1 page
Looker
No ratings yet
Looker
57 pages
Snowflake Admin Keypoints
No ratings yet
Snowflake Admin Keypoints
3 pages
Python Pandas-Series-neww
100% (1)
Python Pandas-Series-neww
80 pages
AI-102 Microsoft_5
No ratings yet
AI-102 Microsoft_5
22 pages
Pandas Cheat Sheet CN
No ratings yet
Pandas Cheat Sheet CN
4 pages
Permissions Poster SQL Server VNext and SQLDB
No ratings yet
Permissions Poster SQL Server VNext and SQLDB
1 page
TensorFlow With R
No ratings yet
TensorFlow With R
46 pages
Preparing Data For Analysis Using Excel
No ratings yet
Preparing Data For Analysis Using Excel
10 pages
Sanga MSTR
0% (1)
Sanga MSTR
443 pages
Python Variables Cheatsheet
No ratings yet
Python Variables Cheatsheet
2 pages
Customer Segmentation Clustering
No ratings yet
Customer Segmentation Clustering
35 pages
Day64 - Pandas Interview Questions
No ratings yet
Day64 - Pandas Interview Questions
5 pages
Lesson 5 Data Wrangling in Data Science.
100% (1)
Lesson 5 Data Wrangling in Data Science.
11 pages
CheatSheet Python 3 Complex Data Types
No ratings yet
CheatSheet Python 3 Complex Data Types
1 page
STAT 451: Intro To Machine Learning Lecture Notes
100% (1)
STAT 451: Intro To Machine Learning Lecture Notes
17 pages
Big Data Technology
100% (1)
Big Data Technology
10 pages
DAX Cheat Sheet
No ratings yet
DAX Cheat Sheet
10 pages
SQL - Basics
No ratings yet
SQL - Basics
25 pages
Data Visualisation Using Pyplot
No ratings yet
Data Visualisation Using Pyplot
20 pages
Jmeter Interview Q&A
No ratings yet
Jmeter Interview Q&A
5 pages
SQL Server Interview Questions With Answers Set 2 40 Questionsanswers
100% (1)
SQL Server Interview Questions With Answers Set 2 40 Questionsanswers
31 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
Snowflake Demo
No ratings yet
Snowflake Demo
13 pages
SQL Quiz
No ratings yet
SQL Quiz
4 pages
Mastering SQL Window Functions - 01
No ratings yet
Mastering SQL Window Functions - 01
39 pages
Project 4 - Cars-Datasets PDF
100% (2)
Project 4 - Cars-Datasets PDF
44 pages
Python Technical Interviews Questions
100% (1)
Python Technical Interviews Questions
15 pages
76 - Sample - Chapter Kunci M2K3 No 9
No ratings yet
76 - Sample - Chapter Kunci M2K3 No 9
94 pages
ML Algorithms
100% (1)
ML Algorithms
1 page
Py Spark
No ratings yet
Py Spark
427 pages
Best Practices For Tableau
No ratings yet
Best Practices For Tableau
6 pages
Python Lists: List Initialization
No ratings yet
Python Lists: List Initialization
25 pages
Querying Microsoft SQL Server
No ratings yet
Querying Microsoft SQL Server
3 pages
SQL Server Import Manual
No ratings yet
SQL Server Import Manual
132 pages
Data Science & Business Analytics: Post Graduate Program in
No ratings yet
Data Science & Business Analytics: Post Graduate Program in
16 pages
SQL Interview
No ratings yet
SQL Interview
5 pages
Exscript en
No ratings yet
Exscript en
33 pages
SQL Notebook by Rishabh
No ratings yet
SQL Notebook by Rishabh
101 pages
Cleaning Dirty Data With Pandas & Python - DevelopIntelligence Blog PDF
No ratings yet
Cleaning Dirty Data With Pandas & Python - DevelopIntelligence Blog PDF
8 pages
New Batches Info: Quality Thought Ai-Data Science Diploma
No ratings yet
New Batches Info: Quality Thought Ai-Data Science Diploma
16 pages
EDA Assignment
No ratings yet
EDA Assignment
15 pages
MySQL Interview Questions
No ratings yet
MySQL Interview Questions
8 pages
Micro Strategy Material
No ratings yet
Micro Strategy Material
298 pages
Strings PDF
No ratings yet
Strings PDF
14 pages
Keras Cheat Sheet Python
No ratings yet
Keras Cheat Sheet Python
1 page
Oracle Database 11g R2 Performance Tuning Cookbook
From Everand
Oracle Database 11g R2 Performance Tuning Cookbook
Ciro Fiorillo
No ratings yet
IBM Cognos 10 Framework Manager
From Everand
IBM Cognos 10 Framework Manager
Terry Curran
No ratings yet
Mastering Data Engineering and Analytics with Databricks: A Hands-on Guide to Build Scalable Pipelines Using Databricks, Delta Lake, and MLflow (English Edition)
From Everand
Mastering Data Engineering and Analytics with Databricks: A Hands-on Guide to Build Scalable Pipelines Using Databricks, Delta Lake, and MLflow (English Edition)
Manoj Kumar
No ratings yet
Deep Learning for Computer Vision with SAS: An Introduction
From Everand
Deep Learning for Computer Vision with SAS: An Introduction
Robert Blanchard
No ratings yet
Data Engineering Complete Self-Assessment Guide
From Everand
Data Engineering Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
Excel 2013/2016: Get Your Hands Dirty
From Everand
Excel 2013/2016: Get Your Hands Dirty
Sam Akrasi
No ratings yet
PostgreSQL 9 High Availability Cookbook
From Everand
PostgreSQL 9 High Availability Cookbook
Shaun M. Thomas
5/5 (2)
HBase Administration Cookbook
From Everand
HBase Administration Cookbook
Yifeng Jiang
No ratings yet
Fast Data Processing with Spark 2 - Third Edition
From Everand
Fast Data Processing with Spark 2 - Third Edition
Krishna Sankar
No ratings yet
Instant Pentaho Data Integration Kitchen
From Everand
Instant Pentaho Data Integration Kitchen
Sergio Ramazzina
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Pandas Cheat Sheet

Uploaded by

Pandas Cheat Sheet

Uploaded by

Data Wrangling

with pandas M A Cheat Sheet

Tidy data complements pandas’s vectorized

Each observation is saved in its own row Reshaping

M A pd.melt(df) Gather columns into rows.

Spread rows into columns.

Subset Observations (Rows)

Handling Missing Data

adf[adf.x1.isin(bdf.x1)] Group Data

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Pandas Cheat Sheet

Uploaded by

Pandas Cheat Sheet

Uploaded by

Data Wrangling

with pandas ​M A ​Cheat Sheet

Tidy data complements pandas’s ​vectorized ​

Each ​observation ​is ​saved in its own ​row​ Reshaping

M​ A ​pd.melt(df) ​Gather columns into rows.

Spread rows into columns.

Subset Observations ​(Rows)

Handling Missing Data

adf[adf.x1.isin(bdf.x1)] ​Group Data

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

with pandas M A Cheat Sheet

Tidy data complements pandas’s vectorized

Each observation is saved in its own row Reshaping

M A pd.melt(df) Gather columns into rows.

Subset Observations (Rows)

adf[adf.x1.isin(bdf.x1)] Group Data