0% found this document useful (0 votes)
4 views

Pandas - Cheatsheet

Uploaded by

Nandan Patkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Pandas - Cheatsheet

Uploaded by

Nandan Patkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Pandas Cheat Sheet

by Justin1209 (Justin1209) via cheatography.com/101982/cs/21202/

Import the Pandas Module Loading and Saving CSVs (cont) Converting Datatypes

import pandas as pd # Get the first DataFrame chunk: # Convert argument to numeric type
df_urb_pop panda​s.t​o_n​ume​ric​(arg, errors​‐
Create a DataFrame df_urb_pop = next(u​rb_​pop​_re​‐ ="ra​ise​")

# Method 1 ader) errors:


"​rai​se" -> raise an exception
df1 = pd.Dat​aFr​ame({
Inspect a DataFrame "​coe​rce​" -> invalid parsing will be set as
​ ​ ​ ​ 'n​ame': ['John Smith',
NaN
'Jane Doe'], df.head(5) First 5 rows
​ ​ ​ ​ 'a​ddr​ess': ['13 Main St.', df.info() Statistics of columns (row
DataFrame for Select Columns / Rows
'46 Maple Ave.'], count, null values, datatype)
​ ​ ​ ​ 'a​ge': [34, 28] df = pd.DataFrame([

}) Reshape (for Scikit) ​ ​['J​anu​ary', 100, 100, 23,

# Method 2 100],
nums = np.array(range(1, 11))
df2 = pd.Dat​aFr​ame([ ​ ​['F​ebr​uary', 51, 45, 145, 45],
-> [ 1 2 3 4 5 6 7 8 9 10]
​ ​ ​ ​['John Smith', '123 Main ​ ​['M​arch', 81, 96, 65, 96],
nums = nums.r​esh​ape(-1, 1)
St.', 34], ​ ​['A​pril', 80, 80, 54, 180],
-> [ [1],
​ ​ ​ ​['Jane Doe', '456 Maple ​ ​['May', 51, 54, 54, 154],
[2],
Ave.', 28], ​ ​['J​une', 112, 109, 79, 129]],
[3],
​ ​ ​ ​['Joe Schmo', '9 Broadway', ​ ​col​umn​s=[​'mo​nth', 'east',
[4],
51] 'north', 'south', 'west']
[5],
​ ​ ​ ], )
[6],
​ ​ ​ ​ c​olu​mns​ =​[ '​name', [7],
Select Columns
'address', 'age']) [8],
# Select one Column
[9],
Loading and Saving CSVs clin​ic_​north = df.north
[10]]
# Load a CSV File in to a --> Reshape values for Scikit
You can think of resh​ape() as rotating this
DataFrame learn: clinic​_no​rth.​val​ues.re​‐
array. Rather than one big row of numbers,
df = pd.rea​d_c​sv(​'my​-cs​v-f​‐ sha​pe(-1, 1)
nums is now a big column of numbers -
ile.csv') # Select multiple Columns
there’s one number in each row.
# Saving DataFrame to a CSV File clin​ic_​nor​th_​south = df[['n​‐

df.to_​csv​('n​ew-​csv​-fi​le.c​sv') orth', 'south']]

# Load DataFrame in Chunks (For Make sure that you have a double set of
large Datase​ts) brackets [[ ]], or this command won’t work!
# Initialize reader object:
urb_po​p_r​eader
urb_po​p_r​eader = pd.rea​d_c​‐
sv(​'in​d_p​op_​dat​a.csv', chunks​‐
ize​=1000)

By Justin1209 (Justin1209) Published 23rd November, 2019. Sponsored by Readable.com


cheatography.com/justin1209/ Last updated 31st January, 2020. Measure your website readability!
Page 1 of 4. https://readable.com
Pandas Cheat Sheet
by Justin1209 (Justin1209) via cheatography.com/101982/cs/21202/

Select Rows Adding a Column Performing Column Operation (cont)

# Select one Row df = pd.DataFrame([ -> lower, upper


march = df.iloc[2] ​ [1, '3 inch screw', 0.5, # Perform a lambda Operation on
# Select multiple Rows 0.75], a Column
jan_fe​b_march = df.ilo​c[:3] ​ [2, '2 inch nail', 0.10, get_la​st_name = lambda x:
feb_ma​rch​_april = df.ilo​c[1:4] 0.25], x.spli​t(" "​)[-1]
may_june = df.ilo​c[-2:] ​ [3, 'hammer', 3.00, 5.50], df['la​st_​name'] = df.Nam​e. ​ap​ply​‐
# Select Rows with Logic ​ [4, 'screw​dri​ver', 2.50, 3.00] (ge​t_l​ast​_na​me)
january = df[df.m​onth == ],
'January'] ​ ​col​umn​s=[​'Pr​oduct ID', 'Descr​‐ Performing a Operation on Multiple
-> <, >, <=, >=, !=, == ipt​ion', 'Cost to Manufa​cture', Columns
march_​april = df[(df.month == 'Price'] df = pd.DataFrame([
'March') | (df.month == ) ​ ​["Ap​ple​", 1.00, "​No"],
'April')] # Add a Column with specified ​ ​["Mi​lk", 4.20, "​No"],
-> &, | row-va​lues ​ ​["Paper Towels​", 5.00, "​‐
januar​y_f​ebr​uar​y_march = df['Sold in Bulk?'] = ['Yes', Yes​"],
df[df​.m​ont​h.i​sin​ (​['J​anu​ary', 'Yes', 'No', 'No'] ​ ​["Light Bulbs", 3.75, "​Yes​"],
'Febru​ary', 'March'])] # Add a Column with same value ],
-> column​_na​me.i​si​n([​" ", " "]) in every row ​ ​col​umn​s=[​"​Ite​m", "​Pri​ce", "Is
df['Is taxed?'] = 'Yes'
Selecting a Subset of a Dataframe often taxed?​"])
results in non-​con​sec​utive indices. # Add a Column with calcul​ation # Lambda Function
df['Re​venue'] = df['Pr​ice'] - df['Price with Tax'] = df.app​‐
Using .res​et_​ind​ex() will create a new df['Cost to Manufa​cture'] ly(​lambda row:
DataFrame move the old indices into a new ​ ​ ​ ​ ​row​['P​rice'] * 1.075
colum called index. Performing Column Operation ​ ​ ​ ​ if row['Is taxed?'] ==
df = pd.DataFrame([ 'Yes'
Use .res​et_​ind​ex(​dro​p=T​rue) if you dont ​ ​['JOHN SMITH', 'john.s​mi​th@​‐ ​ ​ ​ ​ else row['P​rice'],
need the index column. ​ ​ ​ ​ ​ a​xis=1
gma​il.c​om'],
Use .res​et_​ind​ex(​inp​lac​e=T​rue) to prevent
​ ​['Jane Doe', 'jdoe@​yah​oo.c​‐ )
a new DataFrame from brein created.
om'], We apply a lambda to rows, as opposed to
​ ​['joe schmo', 'joesc​hmo​@ho​tma​‐ columns, when we want to perform functi​‐
il.c​om'] onality that needs to access more than one
], column at a time.
column​s=[​'Name', 'Email'])
# Changing a column with an
Operat​ion
df['Name'] = df.Nam​e. ​ap​ply​(lo​‐
wer)

By Justin1209 (Justin1209) Published 23rd November, 2019. Sponsored by Readable.com


cheatography.com/justin1209/ Last updated 31st January, 2020. Measure your website readability!
Page 2 of 4. https://readable.com
Pandas Cheat Sheet
by Justin1209 (Justin1209) via cheatography.com/101982/cs/21202/

Rename Columns Column Statistics Pivot Tables

# Method 1 Mean = Average df.col​ um​n .​m​ean() orders =


df.c​olu​mns = ['NewN​ame_1', Median df.co​lum​n .​m​edi​an() pd.read_csv('orders.csv')
'NewNa​me_2, 'NewNa​me_3', '...'] shoe_c​ounts = orders.
Minimal Value df.co​lum​n .​m​in()
# Method 2 groupb​y([​'sh​oe_​type', 'shoe_​col​‐
Maximum Value df.co​lum​n .​m​ax()
df.r​ena​me​(c​olu​mns={ or']).
Number of Values df.co​lum​n .​c​oun​t()
​ ​ ​ ​'Ol​dNa​me_1': 'NewNa​me_1', id.cou​nt(​).r​ese​t_i​ndex()
​ ​ ​ ​'Ol​dNa​me_2': 'NewNa​me_2' Unique Values df.col​ um​n .​n​uni​‐ shoe_c​oun​ts_​pivot = shoe_c​oun​‐
}, inpl​ace​=Tr​ue) que() ts.p​ivot(
Standard Deviation df.col​ um​n .​s​td() index = 'shoe_​type',
Using inpl​ace​=True lets us edit the original
DataFrame. List of Unique df.co​lum​n .​u​niq​ue() colu​mns = 'shoe_​color',
Values values = 'id').r​es​et_​index()
Series vs. Dataframes Dont't forget reset_​ind​ex() at the end of a We have to build a temporary table where
# Dataframe and Series grou​pby operation we group by the columns we want to
print(​typ​e(c​lin​ic_​nor​th)): include in the pivot table
# <class 'panda​s.c​ore.se​rie​s.S​eri​es'>
​ Calcul​ating Aggregate Functions
print(​typ​e(df)): # Group By Merge (Same Column Name)
# <class 'panda​s.c​ore.fr​ame.Da​taF​ram​e'> grouped = df. g​rou​pby​(['​col1', sales = pd.read_csv('sales.csv')
print(​typ​e(c​lin​ic_​nor​th_​south)) 'col2'​])​.col3 targets = pd.rea​d_c​sv(​'ta​rge​‐
# <class 'panda​s.c​ore.fr​ame.Da​taF​rame
​ '> .meas​ure​men​t()​. ​re​set​_in​dex() ts.c​sv')
In Pandas # -> group by column1 and men_women = pd.rea​d_c​sv(​'me​n_w​‐
- a series is a one-di​men​sional object column2, calculate values of ome​n_s​ale​s.csv')
that contains any type of data. column3 # Method 1
# Percen​tile sales_​targets = pd .m​erg​e ​(sales,
- a data​frame is a two-di​men​sional high_e​arners = df.gro​upb​y('​cat​‐ targets, how=​" ")
object that can hold multiple columns of ego​ry'​).wage # how: "​inn​er"(​def​ault), "​out​‐
different types of data.
​ ​ ​ .​app​ly(​lam​bda x: np.per​cen​‐ er", "​lef​t", "​rig​ht"
tile(x, 75)) #Method 2 (Method Chaini​ng)
A single column of a dataframe is a series,
​ ​ ​ .​res​et_​ind​ex() all_data = sales ​.m​erg​e ​(ta​rge​‐
and a data​frame is a container of two or
# np.per​centile can calculate ts)​.m​er​ge​(m​en_​women)
more series objects.
any percentile over an array of
values

Don't forget reset.i​nd​ex()

By Justin1209 (Justin1209) Published 23rd November, 2019. Sponsored by Readable.com


cheatography.com/justin1209/ Last updated 31st January, 2020. Measure your website readability!
Page 3 of 4. https://readable.com
Pandas Cheat Sheet
by Justin1209 (Justin1209) via cheatography.com/101982/cs/21202/

Inner Merge (Different Column Name) Melt

orders = panda​s.m​elt​(Da​taF​rame, id_vars,


pd.read_csv('orders.csv') value_​vars, var_name, value_​nam​‐
products = pd.rea​d_c​sv(​'pr​odu​‐ e='​val​ue')
cts.csv') id_vars: Column(s) to use as identifier
# Method 1: Rename Columns variables.
orders​_pr​oducts = pd .m​erg​e ​(or​‐ value​_vars: Column(s) to unpivot. If not
ders, produc​ts​.r​ena​me​(c​olu​mns​= specified, uses all columns that are not set
{'​id'​:'p​rod​uct​_id'}), how=​" ") as id_vars.

.res​et_​ind​ex() var_n​ame: Name to use for the ‘variable’


column.
# how: "​inn​er"(​def​ault), "​out​‐
value​_name: Name to use for the ‘value’
er", "​lef​t", "​rig​ht"
column.
# Method 2:
orders​_pr​oducts = Unpivot a DataFrame from wide to long
pd.m​erg​e ​(or​ders, products, format, optionally leaving identi​fiers set.

​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ l​eft​_on​ =​"​pro​‐
Assert Statements
duc​t_i​d",
​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ r​igh​t_o​n ​‐ # Test if country is of type
="id​", object
​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ s​uff​ixe​s ​=["_​‐ assert gapmin​der.co​unt​ry.d​types
ord​ers​"​,"_p​rod​uct​s"]) == np.obj​ect
# Test if year is of type int64
Method 2:
assert gapmin​der.ye​ar.d​types ==
If we use this syntax, we’ll end up with two
columns called id. np.int64

Pandas won’t let you have two columns # Test if life_e​xpe​ctancy is of


with the same name, so it will change them type float64
to id_x and id_y. assert gapmin​der.li​fe_​exp​ect​anc​‐
We can help make them more useful by y.d​types == np.flo​at64
using the keyword suff​ixes. # Assert that country does not
contain any missing values
Concat​enate assert pd.not​nul​l(g​apm​ind​er.c​ou​‐
bakery = ntr​y).a​ll()
pd.read_csv('bakery.csv') # Assert that year does not
ice_cream = pd.rea​d_c​sv(​'ic​e_c​‐ contain any missing values
rea​m.csv') assert pd.not​nul​l(g​apm​ind​er.y​ea​‐
menu = pd.c​onc​at​([​bakery, r).a​ll()
ice_cr​eam])

By Justin1209 (Justin1209) Published 23rd November, 2019. Sponsored by Readable.com


cheatography.com/justin1209/ Last updated 31st January, 2020. Measure your website readability!
Page 4 of 4. https://readable.com

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy